Page MenuHomePhabricator

aaron (Aaron Schulz)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Oct 20 2014, 5:25 PM (251 w, 5 d)
Availability
Available
IRC Nick
AaronSchulz
LDAP User
Aaron Schulz
MediaWiki User
Aaron Schulz [ Global Accounts ]

Recent Activity

Yesterday

aaron added a comment to T230065: 'Wikimedia\Rdbms\DBQueryError' with message 'A database query error has occurred. Did you forget to run your application's database schema updater after upgrading?.

I don't think so.

Sat, Aug 17, 7:05 PM · Patch-For-Review, Performance-Team, Core Platform Team Workboards (Clinic Duty Team), Mediawiki-Rdbms, Wikimedia-production-error
aaron closed T226785: Phase out use of vary-revision with more specific flags and improve related logging as Resolved.
Sat, Aug 17, 1:17 AM · MW-1.34-notes (1.34.0-wmf.17; 2019-08-06), Core Platform Team Workboards (Clinic Duty Team), Performance-Team, MediaWiki-Parser

Thu, Aug 15

aaron added a comment to T229566: BagOStuff InvalidArgumentException from line 710.

Does this still occur?

Thu, Aug 15, 2:35 PM · Performance-Team, MediaWiki-Cache

Mon, Aug 12

aaron moved T229694: Warning: EchoModerationController::moderate: transaction round 'MWCallableUpdate::doUpdate' still running from Doing to Blocked or Needs-CR on the Performance-Team board.
Mon, Aug 12, 7:47 PM · Core Platform Team Workboards (Clinic Duty Team), Regression, MediaWiki-General, Patch-For-Review, Performance-Team
aaron moved T230025: Create HtmlCacheUpdater service class to normalize purging code from Doing to Blocked or Needs-CR on the Performance-Team board.
Mon, Aug 12, 7:46 PM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, User-Daniel, Performance-Team
aaron moved T202116: LoadBalancer opening extra connections in different connection categories doesn't work with PHPUnit & temporary tables from Inbox to Backlog: Small & Maintenance on the Performance-Team board.
Mon, Aug 12, 7:45 PM · Performance-Team, MediaWiki-Core-Testing, Mediawiki-Rdbms, User-Addshore
aaron moved T216496: Misleading "replica catching up" error when master DB is down from Inbox to Doing on the Performance-Team board.
Mon, Aug 12, 7:44 PM · Patch-For-Review, Performance-Team, patch-welcome, Mediawiki-Rdbms
aaron closed T228525: If JobQueueEventBus fails to send a job exception is left uncaught, a subtask of T225199: Fatal error during RecentChange::notifyEdit (deferred update) from ORES/RecentChangeSaveHookHandler, as Invalid.
Mon, Aug 12, 7:43 PM · Growth-Team (Current Sprint), MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Scoring-platform-team, WMF-JobQueue, ORES, Wikimedia-production-error
aaron closed T228525: If JobQueueEventBus fails to send a job exception is left uncaught as Invalid.

Per my comment above, this is the expected behavior.

Mon, Aug 12, 7:43 PM · Performance-Team, Core Platform Team Workboards (Clinic Duty Team), Core Platform Team (Needs Cleaning - Security, stability, performance and scalability (TEC1)), WMF-JobQueue, Wikimedia-production-error
aaron moved T230025: Create HtmlCacheUpdater service class to normalize purging code from Inbox to Doing on the Performance-Team board.
Mon, Aug 12, 7:43 PM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, User-Daniel, Performance-Team
aaron moved T230037: Create warmup procedure for MediaWiki app servers from Inbox to Backlog: Future Goals on the Performance-Team board.
Mon, Aug 12, 7:42 PM · serviceops, Performance-Team
aaron moved T230065: 'Wikimedia\Rdbms\DBQueryError' with message 'A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? from Inbox to Doing on the Performance-Team board.
Mon, Aug 12, 7:41 PM · Patch-For-Review, Performance-Team, Core Platform Team Workboards (Clinic Duty Team), Mediawiki-Rdbms, Wikimedia-production-error
aaron claimed T230065: 'Wikimedia\Rdbms\DBQueryError' with message 'A database query error has occurred. Did you forget to run your application's database schema updater after upgrading?.
Mon, Aug 12, 7:41 PM · Patch-For-Review, Performance-Team, Core Platform Team Workboards (Clinic Duty Team), Mediawiki-Rdbms, Wikimedia-production-error
aaron moved T230260: Page view triggers ResourceLoaderWikiModule db queries for enabled gadgets (from OutputPage) from Inbox to Radar on the Performance-Team board.
Mon, Aug 12, 7:39 PM · Performance-Team, Patch-For-Review, MediaWiki-ResourceLoader, MediaWiki-Cache, Gadgets
aaron added a comment to T51195: Drop filejournal table from WMF.

It's an optional table, not installed by update.php.

Mon, Aug 12, 6:26 PM · DBA, Performance-Team (Radar), MediaWiki-File-management

Fri, Aug 9

aaron updated subscribers of T229062: Look into a simple way to have global keys with db-replicated.
Fri, Aug 9, 4:46 PM · Performance-Team (Radar), MediaWiki-Cache
aaron updated the task description for T229062: Look into a simple way to have global keys with db-replicated.
Fri, Aug 9, 4:45 PM · Performance-Team (Radar), MediaWiki-Cache
aaron added a comment to T226167: audit public tables and make sure we dump them all.

They were obsoleted by flaggedrevs_statistics.

Fri, Aug 9, 5:54 AM · Patch-For-Review, Dumps-Generation

Thu, Aug 8

aaron closed T226432: Investigate use of vary-revision flags on group2 wikis as Resolved.

The remaining vary-revision instances are basic self-transclusions (https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/526157/ should handle those).

Thu, Aug 8, 11:03 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Patch-For-Review, Performance-Team
aaron closed T211220: Update Save Timing grafana dashboards to break down by content model as Resolved.
Thu, Aug 8, 10:15 PM · Performance-Team

Mon, Aug 5

aaron committed rEFLI5f2aaa110ee0: Convert FileImporterSuccessCache to "db-replicated" cache (authored by aaron).
Convert FileImporterSuccessCache to "db-replicated" cache
Mon, Aug 5, 3:22 PM

Fri, Aug 2

aaron created T229694: Warning: EchoModerationController::moderate: transaction round 'MWCallableUpdate::doUpdate' still running.
Fri, Aug 2, 7:53 PM · Core Platform Team Workboards (Clinic Duty Team), Regression, MediaWiki-General, Patch-For-Review, Performance-Team

Thu, Aug 1

aaron merged T229605: File pages are not created: Fatal exception of type "Wikimedia\Rdbms\DBQueryError" into T229589: PHP Notice: Undefined property: MediaWiki\Revision\RevisionRenderer::$wikiId.
Thu, Aug 1, 8:01 PM · MW-1.34-notes (1.34.0-wmf.16; 2019-07-30), Performance-Team, MediaWiki-Revision-backend, Wikisource, Wikimedia-production-error
aaron merged task T229605: File pages are not created: Fatal exception of type "Wikimedia\Rdbms\DBQueryError" into T229589: PHP Notice: Undefined property: MediaWiki\Revision\RevisionRenderer::$wikiId.
Thu, Aug 1, 8:01 PM · MediaWiki-Revision-backend, Wikimedia-production-error, video2commons, Commons, Wikimedia-database-error

Wed, Jul 31

aaron added a comment to T212881: addWiki.php broken creating ES tables.

Is https://phabricator.wikimedia.org/T212881#5195101 the error that still happens or is it the read-only one too?

Wed, Jul 31, 11:34 PM · Patch-For-Review, Core Platform Team Workboards (Clinic Duty Team), Performance-Team (Radar), MediaWiki-extensions-WikimediaMaintenance
aaron committed rERLS2efa2553f868: Cleanup use of IDatabase::affectedRows() (authored by aaron).
Cleanup use of IDatabase::affectedRows()
Wed, Jul 31, 3:56 PM
aaron added a comment to T219592: Frequent Echo DB_MASTER write queries on HTTP GET.

Jobs are fine...though this case is complicated since people want their "latest views" to be immediately reflected...so it would have to do something like WatchedItemStore.

Wed, Jul 31, 6:16 AM · CPT Initiatives (Multi-DC (TEC1)), Growth-Team, Notifications, Services (watching), Performance-Team (Radar), Availability (MediaWiki-MultiDC)
aaron added a comment to T212881: addWiki.php broken creating ES tables.

How much of this is unique from T205936 ?

Wed, Jul 31, 2:38 AM · Patch-For-Review, Core Platform Team Workboards (Clinic Duty Team), Performance-Team (Radar), MediaWiki-extensions-WikimediaMaintenance

Sat, Jul 27

aaron committed rESRD5898e5267daa: Cleaned up recache() to behave more like the parent method (authored by aaron).
Cleaned up recache() to behave more like the parent method
Sat, Jul 27, 11:16 PM

Thu, Jul 25

aaron created T229062: Look into a simple way to have global keys with db-replicated.
Thu, Jul 25, 9:38 PM · Performance-Team (Radar), MediaWiki-Cache

Tue, Jul 23

aaron added a comment to T227401: MediaWiki should query master instead of replica if replica is too lagged.

I wonder if this is fixed in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/519565/

Tue, Jul 23, 7:53 PM · Core Platform Team, MW-1.34-release, Mediawiki-Rdbms
aaron closed T212284: Fatal db error "Could not select database 'centralauth'" (sometimes also 'metawiki') as Resolved.

The logs for doSelectDomain() look quite for the last 7 days.

Tue, Jul 23, 4:43 PM · Core Platform Team Workboards (Clinic Duty Team), MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), Core Platform Team (Needs Cleaning - Security, stability, performance and scalability (TEC1)), Performance-Team (Radar), Services (next), Mediawiki-Rdbms, MediaWiki-extensions-CentralAuth, Wikimedia-production-error
aaron added a comment to T228749: AssembleUploadChunksJob triggers: SqlBagOStuff: tries to serialize closure.

959daa2ca44c039e72c8a9a5199d4c74dd05caba added the << $status->value = [ 'warnings' => $upload->checkWarnings() ]; >> line. It seems like checkWarnings() has all kinds of File objects inside of it potentially. Some callback could easily slip in given that.

Tue, Jul 23, 4:19 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Core Platform Team (Needs Cleaning - Code Health (TEC13)), Core Platform Team Workboards (Clinic Duty Team), Multimedia, MediaWiki-Uploading, Wikimedia-production-error

Mon, Jul 22

aaron added a comment to T212129: Use a multi-dc aware store for ObjectCache's MainStash if needed..

ObjectCache always mentioned getMainStashInstance() as "Ephemeral global storage". It was just supposed to *try harder* to be persistent than memcached (rdb snapshots, expectation that stuff can *probably* still be there a week later or so). The existence of redis evictions and consistent re-hashing on host failure making data disappear or go stale was well known at the time it was picked as the original "stash".

Mon, Jul 22, 8:10 PM · CPT Initiatives (Mainstash Multi-DC), MediaWiki-General, serviceops-radar, User-mobrovac, User-jijiki, Performance-Team (Radar), Operations
aaron added a comment to T228465: TorBlock maintenance failures on labweb hosts.

Fixed and confirmed

[0450][www-data@labweb1001:/]$ time /usr/local/bin/mwscript extensions/TorBlock/maintenance/loadExitNodes.php --wiki=labswiki --force
Successfully loaded 1206 exit nodes.
real	0m1.958s
user	0m0.572s
sys	0m0.212s

I guess the cron jobs should be removed now, is there a bug for that?

Mon, Jul 22, 7:27 PM · MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Patch-For-Review, MediaWiki-extensions-TorBlock

Fri, Jul 19

aaron added a comment to T228525: If JobQueueEventBus fails to send a job exception is left uncaught.

JobQueueException should be thrown from push(), with nothing catching it other than MWExceptionHandler or site-specific caller. Typically, push() should be used pre-send, before preOutputCommit, so everything would just rollback anyway. Jobs pushed after than are enqueued during DeferrableUpdates (directly or indirectly via lazyPush()); in that case, DeferredUpdates should (already) catch any exceptions (not just job queue ones) and rollback on an update-by-update bases. The exceptions are logged in the DeferredUpdates channel (previously the Exception channel).

Fri, Jul 19, 7:47 PM · Performance-Team, Core Platform Team Workboards (Clinic Duty Team), Core Platform Team (Needs Cleaning - Security, stability, performance and scalability (TEC1)), WMF-JobQueue, Wikimedia-production-error
aaron added a comment to T228436: web request timeout after 200 seconds due to Wikimedia\Rdbms\LBFactory->__destruct() > Wikimedia\Rdbms\LBFactory->commitMasterChanges().

Also, the timeout exceptions themselves where redis, not LBFactory. The later seemed to just have errors related to the improper shutdown.

Fri, Jul 19, 7:38 PM · Performance-Team, Mediawiki-Rdbms, Wikimedia-production-error

Jul 19 2019

aaron added a comment to T225103: LBFactory destructor causes unexpected exception at shutdown.

Is this still a train blocker?

Jul 19 2019, 6:11 AM · Patch-For-Review, MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Mediawiki-Rdbms, Performance-Team, Wikimedia-production-error

Jul 18 2019

aaron added a comment to T51195: Drop filejournal table from WMF.

Dropping the field doesn't make sense, but dropping the whole table does. We do not use that class in production (and it is optional within MW core).

Jul 18 2019, 9:51 PM · DBA, Performance-Team (Radar), MediaWiki-File-management
aaron claimed T228465: TorBlock maintenance failures on labweb hosts.
Jul 18 2019, 9:41 PM · MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Patch-For-Review, MediaWiki-extensions-TorBlock
aaron added a parent task for T228465: TorBlock maintenance failures on labweb hosts: T220739: 1.34.0-wmf.14 deployment blockers.
Jul 18 2019, 8:39 PM · MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Patch-For-Review, MediaWiki-extensions-TorBlock
aaron added a subtask for T220739: 1.34.0-wmf.14 deployment blockers: T228465: TorBlock maintenance failures on labweb hosts.
Jul 18 2019, 8:39 PM · Release-Engineering-Team-TODO (201907), Release-Engineering-Team (Deployment services), Release, Train Deployments
aaron closed T228303: Redis exception connecting to "/var/run/nutcracker/redis_eqiad.sock": read error on connection, a subtask of T220739: 1.34.0-wmf.14 deployment blockers, as Resolved.
Jul 18 2019, 8:37 PM · Release-Engineering-Team-TODO (201907), Release-Engineering-Team (Deployment services), Release, Train Deployments
aaron closed T228303: Redis exception connecting to "/var/run/nutcracker/redis_eqiad.sock": read error on connection as Resolved.
Jul 18 2019, 8:36 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), serviceops, Operations, Performance-Team, MediaWiki-Cache, Wikimedia-production-error
aaron added a parent task for T225103: LBFactory destructor causes unexpected exception at shutdown: T220739: 1.34.0-wmf.14 deployment blockers.
Jul 18 2019, 8:36 PM · Patch-For-Review, MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Mediawiki-Rdbms, Performance-Team, Wikimedia-production-error
aaron added a parent task for T228303: Redis exception connecting to "/var/run/nutcracker/redis_eqiad.sock": read error on connection: T220739: 1.34.0-wmf.14 deployment blockers.
Jul 18 2019, 8:36 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), serviceops, Operations, Performance-Team, MediaWiki-Cache, Wikimedia-production-error
aaron added subtasks for T220739: 1.34.0-wmf.14 deployment blockers: T225103: LBFactory destructor causes unexpected exception at shutdown, T228303: Redis exception connecting to "/var/run/nutcracker/redis_eqiad.sock": read error on connection.
Jul 18 2019, 8:36 PM · Release-Engineering-Team-TODO (201907), Release-Engineering-Team (Deployment services), Release, Train Deployments
aaron added a comment to T228436: web request timeout after 200 seconds due to Wikimedia\Rdbms\LBFactory->__destruct() > Wikimedia\Rdbms\LBFactory->commitMasterChanges().

The redis bug is at T228303

Jul 18 2019, 8:16 PM · Performance-Team, Mediawiki-Rdbms, Wikimedia-production-error
aaron merged T228436: web request timeout after 200 seconds due to Wikimedia\Rdbms\LBFactory->__destruct() > Wikimedia\Rdbms\LBFactory->commitMasterChanges() into T225103: LBFactory destructor causes unexpected exception at shutdown.
Jul 18 2019, 8:15 PM · Patch-For-Review, MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Mediawiki-Rdbms, Performance-Team, Wikimedia-production-error
aaron merged task T228436: web request timeout after 200 seconds due to Wikimedia\Rdbms\LBFactory->__destruct() > Wikimedia\Rdbms\LBFactory->commitMasterChanges() into T225103: LBFactory destructor causes unexpected exception at shutdown.
Jul 18 2019, 8:15 PM · Performance-Team, Mediawiki-Rdbms, Wikimedia-production-error
aaron added a comment to T228436: web request timeout after 200 seconds due to Wikimedia\Rdbms\LBFactory->__destruct() > Wikimedia\Rdbms\LBFactory->commitMasterChanges().

The timeouts correspond with the redis problems:

Jul 18 2019, 8:14 PM · Performance-Team, Mediawiki-Rdbms, Wikimedia-production-error
aaron added a comment to T228436: web request timeout after 200 seconds due to Wikimedia\Rdbms\LBFactory->__destruct() > Wikimedia\Rdbms\LBFactory->commitMasterChanges().

The timeout aspect seems strange. The huge "idle" time increase at https://grafana.wikimedia.org/d/000000273/mysql sounds like the PageEditStash::parseAndCache() has an infinite timeout instead of 0 seconds (bug, it should be 0 as in non-blocking) and the parsing may have been slowed down for some reason, making more threads wait on the lock. Maybe the concurrent nutcracker issues were also affecting mcrouter (since the same hosts are used). Could also be something adding memcached write load: https://grafana.wikimedia.org/d/000000316/memcache?orgId=1&from=1563458818482&to=1563464680644 looks a little unusual, though not unlike the result of key version changes that happen from release to release (including the slow return to normal set() rate).

Jul 18 2019, 8:09 PM · Performance-Team, Mediawiki-Rdbms, Wikimedia-production-error
aaron created T228468: Move stats updates from AuthManager::autoCreateUser() HTTP GET to the job queue.
Jul 18 2019, 7:46 PM · Availability (MediaWiki-MultiDC), Performance-Team
aaron added a comment to T225642: Allow async foreign set/delete WAN cache operations in mcrouter.

OK, replication for SET/DELETE seems fine on mw1261/mw2224 for me and the STORED/NOT_STORED and FOUND/NOT_FOUND replies are what I expect when using (no prefix, /otherdc/mw-wan, and /thisdc/mwwan).

Jul 18 2019, 7:07 PM · User-Elukey, Patch-For-Review, Performance-Team (Radar), Availability (MediaWiki-MultiDC), Operations
aaron added a comment to T225642: Allow async foreign set/delete WAN cache operations in mcrouter.

Err, more PEBCAK . I put the * in the wrong spot...

Jul 18 2019, 7:01 PM · User-Elukey, Patch-For-Review, Performance-Team (Radar), Availability (MediaWiki-MultiDC), Operations
aaron added a comment to T225642: Allow async foreign set/delete WAN cache operations in mcrouter.

So, I've noticed that on mw1261/mw2224 as *well* as plain old mwmaint1002,mwmaint2001, that broadcasting keys doesn't seem to work, e.g.:

Jul 18 2019, 6:49 PM · User-Elukey, Patch-For-Review, Performance-Team (Radar), Availability (MediaWiki-MultiDC), Operations
aaron added a comment to T122546: Cache rollback edit counts shown on recent changes.

I guess it can go on our backlog.

Jul 18 2019, 1:37 AM · Performance-Team (Radar), Growth-Team, MediaWiki-Recent-changes

Jul 16 2019

aaron added a comment to T225642: Allow async foreign set/delete WAN cache operations in mcrouter.

Is there a codfw host with the patch applied?

Jul 16 2019, 7:09 PM · User-Elukey, Patch-For-Review, Performance-Team (Radar), Availability (MediaWiki-MultiDC), Operations
aaron added a comment to T225642: Allow async foreign set/delete WAN cache operations in mcrouter.

In terms of what MediaWiki actually queries, you have the following cases from both eqiad and codfw:

a) Local getWithSetCallback() requests for (regular) value keys:
"get WANCache:v:elukey-test"
"add WANCache:v:elukey-test"
"cas WANCache:v:elukey-test"
Jul 16 2019, 6:19 PM · User-Elukey, Patch-For-Review, Performance-Team (Radar), Availability (MediaWiki-MultiDC), Operations

Jul 15 2019

aaron added a comment to T227838: Obsessive serverIsReadOnly() checking in MySQL.

The relevant getWithSetCallback() call uses pcTTL, so there still shouldn't be many of these queries. Unless a large number of distinct connections were acquired. Not just that, but connections to different load balancer clusters.

Jul 15 2019, 5:47 PM · Patch-For-Review, Performance-Team, Mediawiki-Rdbms

Jul 12 2019

aaron added a comment to T227838: Obsessive serverIsReadOnly() checking in MySQL.

Is $wgMainCacheType set to CACHE_NONE ?

Jul 12 2019, 10:18 PM · Patch-For-Review, Performance-Team, Mediawiki-Rdbms
aaron added a comment to T214275: Deprecate the usage of nutcracker for memcached.

The latter should be doable, but the former seems a bit more complicated. Is there any plan to deprecate the labswiki infra and fold it into the appserver layer?

There definitely is such a plan, although it will be quite a while (maybe end of this FY?) before we're able to move forward.

Thanks! I think this is totally fine, there is not real rush to move everything away now from nutcracker.

@elukey I can do thumbor, not sure when yet though. I opened T221081 a while a go for it

Thanks a lot :)
@aaron what do you think? Would it be ok in your opinion to close this task?

Jul 12 2019, 5:39 PM · Wikimedia-General-or-Unknown, serviceops, Performance-Team (Radar), User-Elukey, Operations
aaron added a comment to T225642: Allow async foreign set/delete WAN cache operations in mcrouter.

For generic key testing, there is always:

Jul 12 2019, 5:35 PM · User-Elukey, Patch-For-Review, Performance-Team (Radar), Availability (MediaWiki-MultiDC), Operations

Jul 11 2019

aaron closed T206288: Exception from LinksUpdate "Could not acquire lock for page" when a page is edited frequently as Resolved.

Not seeing this in the logs lately.

Jul 11 2019, 10:18 PM · Patch-For-Review, Performance-Team, MediaWiki-General, Wikimedia-production-error

Jul 10 2019

aaron added a comment to T212129: Use a multi-dc aware store for ObjectCache's MainStash if needed..

I think (1) is more useful and fills a needed gap of writes on GET/HEAD. What I've been doing in T227376 is trying to move things off the Stash that can easily enough use some other store. This narrows down the "problem space".

Jul 10 2019, 9:08 PM · CPT Initiatives (Mainstash Multi-DC), MediaWiki-General, serviceops-radar, User-mobrovac, User-jijiki, Performance-Team (Radar), Operations
aaron added a comment to T227665: Announce or revert ResultWrapper iteration change.

I strongly prefer 0-based.

Jul 10 2019, 7:12 PM · MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Performance-Team, Mediawiki-Rdbms
aaron created T227638: Apply updated YubiKey SSH keys for aaron.
Jul 10 2019, 7:47 AM · SRE-Access-Requests, Operations
aaron updated subscribers of T227376: Move callers away from getMainObjectStash() that do not need it.
Jul 10 2019, 5:59 AM · MW-1.34-notes (1.34.0-wmf.19; 2019-08-20), Wikimedia-General-or-Unknown, Patch-For-Review, Performance-Team

Jul 6 2019

aaron created T227376: Move callers away from getMainObjectStash() that do not need it.
Jul 6 2019, 6:17 AM · MW-1.34-notes (1.34.0-wmf.19; 2019-08-20), Wikimedia-General-or-Unknown, Patch-For-Review, Performance-Team

Jul 5 2019

aaron added a comment to T184529: Define a way to get a database connection based on a logical wiki ID..

I see wiki IDs as a type of "domain ID" that just uses two ASCII components, (dbname,prefix), neither using slashes to avoid the ugliness of using things like "mysite?hnewswiki-en" have to appear on config or in "table_wiki" DB fields.

I think using strings for this at all is a big problem. And encoding any "real" information into these strings makes it even worse. Wiki IDs should be totally opaque identifiers, and we should have a class to model them. @Tgr and I discussed this at the hackathon, but it seems we didn't write it down. I made a ticket now, see T227305: Define a WikiID class for uniquely identifying wikis.

Jul 5 2019, 10:58 PM · CPT Initiatives (Cross-Wiki (CDP2)), User-Daniel, Mediawiki-Rdbms
aaron added a comment to T35409: Transient CDB read/write failures.

Isn't CDB a thing of the past? I.e. can't we close this now?

Jul 5 2019, 10:56 PM · CDB, MediaWiki-General

Jul 4 2019

aaron added a comment to T100585: "Uncommitted DB writes" errors are getting creepy.

Can this be closed now?

Jul 4 2019, 7:36 PM · Fundraising-Backlog, Recurring-Donations, MediaWiki-extensions-DonationInterface, Fundraising-Backlog-Old

Jul 1 2019

aaron closed T36156: SiteStatsInit::refresh() triggered inappropriately, caused downtime as Resolved.

The current code will never trigger this path if $wgMiserMode is set, which it is on production.

Jul 1 2019, 8:27 PM · Performance-Team (Radar), Core Platform Team, Wikimedia-Incident, Mediawiki-Rdbms
aaron claimed T226770: 10X increase in DBPerformance warnings on 1.34-wmf.10.
Jul 1 2019, 7:50 PM · MW-1.34-notes (1.34.0-wmf.13; 2019-07-09), Mediawiki-Rdbms, Performance-Team, Wikimedia-production-error

Jun 28 2019

aaron added a comment to T191035: MediaWiki core @Database tests failure with sqlite.

Is this still a problem?

Jun 28 2019, 9:34 PM · SQLite, MediaWiki-Core-Testing
aaron added a comment to T221987: Performance testing of RESTBagOStuff.

So, one way we could do this is with siege or something similar, hitting a Web page or API endpoint with a session cookie set. We could do a baseline comparison against production (?) and then get similar data for a staging service we set up for T222099.
@aaron we pulled you in for performance discussions before. Do you have any tools that you prefer or recommend for this kind of evaluation?

Jun 28 2019, 2:31 AM · Performance-Team (Radar), Core Platform Team Workboards (Green), CPT Initiatives (Session Management Service (CDP2))

Jun 27 2019

aaron created T226785: Phase out use of vary-revision with more specific flags and improve related logging.
Jun 27 2019, 11:09 PM · MW-1.34-notes (1.34.0-wmf.17; 2019-08-06), Core Platform Team Workboards (Clinic Duty Team), Performance-Team, MediaWiki-Parser
aaron added a comment to T218692: read only on mediawiki generates "LoadBalancer.php: Cannot access the database: Unknown error".

Not sure what to do with this. It seems like some kind of connectivity problem, not just the server being read-only.

Jun 27 2019, 8:34 AM · Performance-Team, Core Platform Team Legacy (Watching / External), WMF-JobQueue, Wikimedia-production-error, Mediawiki-Rdbms

Jun 26 2019

aaron created T226678: Fix infinite LoadBalancer loop if the master has DB non-zero read load.
Jun 26 2019, 10:47 PM · Mediawiki-Rdbms, Performance-Team
aaron created T226595: Refactor LoadBalancer connection pooling to be more efficient.
Jun 26 2019, 2:38 AM · Mediawiki-Rdbms, Patch-For-Review, Performance-Team

Jun 25 2019

aaron committed rELINTe52803ce98a9: Replace Database type hints with IDatabase (authored by aaron).
Replace Database type hints with IDatabase
Jun 25 2019, 8:42 PM

Jun 24 2019

aaron committed rEDBE53d91ea05be8: Use DB_REPLICA instead of DB_SLAVE (authored by aaron).
Use DB_REPLICA instead of DB_SLAVE
Jun 24 2019, 7:54 PM
aaron created T226432: Investigate use of vary-revision flags on group2 wikis.
Jun 24 2019, 5:18 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Patch-For-Review, Performance-Team

Jun 22 2019

aaron added a comment to T225103: LBFactory destructor causes unexpected exception at shutdown.

Seems related: rMW143333b172cd: rdbms: do not close the connection in LoadBalancerSingle::__destruct / https://gerrit.wikimedia.org/r/517350.
Not exactly the same though (commit is LoadBalancer, this bug is LBFactory), but might need a similar change?

Jun 22 2019, 5:57 PM · Patch-For-Review, MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Mediawiki-Rdbms, Performance-Team, Wikimedia-production-error

Jun 19 2019

aaron renamed T221159: FY18/19 TEC1.6 Q4: Improve or replace the usage of GTID_WAIT with pt-heartbeat in MW from FY18/19 TEC1.6 Q4: Replace the usage of GTID_WAIT with pt-heartbeat in MW to FY18/19 TEC1.6 Q4: Improve or replace the usage of GTID_WAIT with pt-heartbeat in MW.
Jun 19 2019, 4:54 PM · Patch-For-Review, Performance-Team (Radar), User-mobrovac, Services (watching), Goal, Core Platform Team Legacy (Watching / External), Mediawiki-Rdbms, DBA
aaron closed T225655: PHP Warning "headers already sent" from MediaWiki::preOutputCommit during SpecialCentralAutoLogin as Resolved.
Jun 19 2019, 11:27 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Performance-Team, MediaWiki-General, Wikimedia-production-error

Jun 18 2019

aaron claimed T224422: Implement logic to filter bogus GTIDs.
Jun 18 2019, 3:36 PM · CPT Initiatives (Multi-DC (TEC1)), MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Patch-For-Review, Performance-Team, Services (watching), Mediawiki-Rdbms
aaron closed T191668: Define varargs in \IDatabase::buildLike() in a way phan can understand it as Resolved.
Jun 18 2019, 3:28 PM · Patch-For-Review, phan, Performance-Team, Wikimedia-production-error (Shared Build Failure), Mediawiki-Rdbms
aaron updated the task description for T215740: Create Icinga check for ArcLamp (xenon-log) service health.
Jun 18 2019, 10:09 AM · Arc-Lamp, observability, Wikimedia-Incident, Performance-Team

Jun 17 2019

aaron updated the task description for T225968: Per component/extension profiling of hooks and pre-send DeferredUpdates with Grafana dashboards.
Jun 17 2019, 8:37 PM · Performance-Team
aaron created T225969: Per template/module profiling with Grafana dashboards.
Jun 17 2019, 8:36 PM · Performance-Team
aaron created T225968: Per component/extension profiling of hooks and pre-send DeferredUpdates with Grafana dashboards.
Jun 17 2019, 8:33 PM · Performance-Team
aaron added a comment to T225775: Investigate purging expired watchlist items as a result of .

I assume a mixture of opportunistic (limited row count) purging of rows during existing watchlist row changes along with WHERE clause filtering on SELECT to ignore expired rows would work (e.g. similar to blocks and page protections).

Jun 17 2019, 8:10 PM · Performance-Team (Radar), Community-Tech
aaron created T225961: Add wiki group breakdown of backend save timing to grafana.
Jun 17 2019, 5:14 PM · Performance-Team
aaron created T225957: Investigate front end saving timing regression starting April 20, 2019.
Jun 17 2019, 4:59 PM · Performance-Team
aaron closed T220470: Investigate backend save timing regression starting at 2019-04-08 19:15:00 as Resolved.

This was likely due to an APC change. Filing a separate task for the 4/20 group 2 regression (which seems out of band for deployments).

Jun 17 2019, 4:56 PM · MW-1.34-notes (1.34.0-wmf.7; 2019-05-28), MediaWiki-Page-editing, Performance-Team
aaron added a comment to T208934: mcrouter does not remove a memcached shard from consistent hashing when timeouts happen.

Some sort of meeting sounds reasonable.

Jun 17 2019, 8:20 AM · Performance-Team (Radar), User-Elukey, MediaWiki-Cache, Operations

Jun 16 2019

aaron committed rEATHb77da7a5a465: Use DB_REPLICA instead of DB_SLAVE (authored by aaron).
Use DB_REPLICA instead of DB_SLAVE
Jun 16 2019, 10:37 PM
aaron committed rEPSO4c96ab681cac: Use DB_REPLICA instead of DB_SLAVE (authored by aaron).
Use DB_REPLICA instead of DB_SLAVE
Jun 16 2019, 10:04 AM
aaron committed rERPGf708bfd186dd: Use DB_REPLICA instead of DB_SLAVE (authored by aaron).
Use DB_REPLICA instead of DB_SLAVE
Jun 16 2019, 9:49 AM