Page MenuHomePhabricator

aaron (Aaron Schulz)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Oct 20 2014, 5:25 PM (279 w, 2 d)
Availability
Available
IRC Nick
AaronSchulz
LDAP User
Aaron Schulz
MediaWiki User
Aaron Schulz [ Global Accounts ]

Recent Activity

Yesterday

aaron added a comment to T245835: Remove 'profileoutput' debug channel from MediaWiki.

+1 for removing this.

Wed, Feb 26, 4:02 PM · MediaWiki-Core-Profiler, Performance-Team

Tue, Feb 25

aaron moved T240684: Test gutter pool failover in production and memcached 1.5.x from Inbox to Doing on the Performance-Team board.
Tue, Feb 25, 5:22 PM · Performance-Team, Patch-For-Review, Operations, serviceops

Mon, Feb 24

Pppery awarded T137900: Deal with poor edit stash hit rate due to Lua modules using {{REVISIONID}} a Dislike token.
Mon, Feb 24, 4:26 AM · Patch-For-Review, MediaWiki-Parser, Performance-Team, User-notice, Parsoid
Pppery awarded T235957: Change {{REVISIONID}} from number to "-" in wgMiserMode a Heartbreak token.
Mon, Feb 24, 4:24 AM · MW-1.35-notes (1.35.0-wmf.21; 2020-02-25), Performance-Team, User-notice, Parsing-Team, MediaWiki-Parser

Fri, Feb 21

aaron committed rEMAS99af79feeace: Fix IDatabase::upsert() call with bad unique key parameters (authored by aaron).
Fix IDatabase::upsert() call with bad unique key parameters
Fri, Feb 21, 12:50 AM

Tue, Feb 18

aaron created T245570: Duplicate entry 'ext.uls.pt-vector|en' for key 'PRIMARY'.
Tue, Feb 18, 11:02 PM · MW-1.35-notes (1.35.0-wmf.20; 2020-02-18), Wikimedia-production-error, MediaWiki-ResourceLoader, Performance-Team
aaron updated subscribers of T235456: Let Arc-Lamp store its trace "log" files in compressed format.

@dpifke Do you want to take this on since you're working in this area?

Tue, Feb 18, 8:42 PM · Arc-Lamp, Performance-Team
aaron placed T235456: Let Arc-Lamp store its trace "log" files in compressed format up for grabs.
Tue, Feb 18, 8:42 PM · Arc-Lamp, Performance-Team

Thu, Feb 13

aaron placed T236880: Document when to use different ILoadBalancer::get*Connection* methods up for grabs.
Thu, Feb 13, 9:23 PM · Performance-Team, Documentation, Core Platform Team Workboards (Clinic Duty Team), Wikimedia-Rdbms, MediaWiki-Documentation
aaron added a comment to T244776: Swift container for performance flame graphs (ArcLamp).

Compression seems doable. LZMA works well per https://phabricator.wikimedia.org/T235455#5837382 . arclamp-grep would have to change though; maybe grep(fname, search_string) could stream zipped log object contents to lzcat and loop through the resulting lines.

Thu, Feb 13, 1:19 AM · Performance-Team, Arc-Lamp, SRE-swift-storage

Wed, Feb 12

aaron added a comment to T228294: Cassandra PHP driver evaluation.

This vaguely reminds me of https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/238370/ . Though implementing KeyValueStore makes more sense indeed.

Wed, Feb 12, 10:35 AM · Core Platform Team, User-Eevans

Tue, Feb 11

aaron added a comment to T244633: Constant inheritance not detected in phan, blocking merge of a patch.

Filed as https://github.com/phan/phan/issues/3706

Tue, Feb 11, 11:25 PM · MW-1.35-notes (1.35.0-wmf.21; 2020-02-25), Patch-For-Review, Upstream, phan, Performance-Team, Release-Engineering-Team (CI & Testing services)
aaron added a comment to T244633: Constant inheritance not detected in phan, blocking merge of a patch.

I can reproduce this with master phan as well:

Tue, Feb 11, 9:43 PM · MW-1.35-notes (1.35.0-wmf.21; 2020-02-25), Patch-For-Review, Upstream, phan, Performance-Team, Release-Engineering-Team (CI & Testing services)
aaron added a comment to T236880: Document when to use different ILoadBalancer::get*Connection* methods.

+1, better documentation would be good. We've had a lot of complexity added to this code over the years.
As far as I can tell,

  • getConnection: Basic "connect to a DB" functionality.
    • Of note is the fact that if you pass a non-false $domain you're supposed to call reuseConnection() before you let it go out of scope, which is easy to forget.
  • getServerConnection: The differences from getConnection seem fairly well documented. The code in LoadBalancer clearly matches.
  • getAnyOpenConnection: Not very well documented. At the least, this one won't open a new connection (returning null if no connection is already opened), which can be useful if you know you can skip some operation if nothing else already connected to the DB. But there may be other differences as well, the implementation seems completely separate from getConnection/getServerConnection.
    • If the only difference is actually the "no new connection" behavior, ideally the implementation should reflect that by passing a flag to some internal method shared with getConnection or getServerConnection.
    • If there are other differences, which seems likely, and those differences shouldn't be fixed, I suspect this shouldn't be on the interface at all.
      • The three external callers only use it to get a DB_MASTER for locking, so perhaps something that returns a proxy that only exposes the locking-related methods would serve to reduce confusion.
  • getConnectionRef: Wraps the handle returned by getConnection with a proxy object that will automatically call reuseConnection() when it goes out of scope.
  • getLazyConnectionRef: Like getConnectionRef but additionally won't even call getConnection until the first time it's actually used. Good if you think it often won't wind up being used after all.
  • getMaintenanceConnectionRef: Like getConnectionRef but with the proxy implementing IMaintainableDatabase rather than just IDatabase.
Tue, Feb 11, 9:30 PM · Performance-Team, Documentation, Core Platform Team Workboards (Clinic Duty Team), Wikimedia-Rdbms, MediaWiki-Documentation
aaron added a comment to T244776: Swift container for performance flame graphs (ArcLamp).

Looks like hierdata/(swift|codfw)/params.yaml needs updating, along with the private puppet repo (beforehand).

Tue, Feb 11, 2:44 AM · Performance-Team, Arc-Lamp, SRE-swift-storage

Mon, Feb 10

aaron closed T44730: wfTempDir() should have better fallbacks as Resolved.

Closing per the above patch (unless some issue remains).

Mon, Feb 10, 11:01 PM · MW-1.28-release-notes, MW-1.27-release-notes, MW-1.28-release (WMF-deploy-2016-05-17_(1.28.0-wmf.2)), Patch-For-Review, Commons, Multimedia, MediaWiki-File-management
aaron moved T244058: Wiki diffs take over 15s to load from Doing to Radar on the Performance-Team board.
Mon, Feb 10, 9:28 PM · Performance-Team, Core Platform Team Workboards (Clinic Duty Team), serviceops, Operations, Wikimedia-production-error
aaron placed T244058: Wiki diffs take over 15s to load up for grabs.
Mon, Feb 10, 9:28 PM · Performance-Team, Core Platform Team Workboards (Clinic Duty Team), serviceops, Operations, Wikimedia-production-error
aaron added a comment to T231086: Picture from Commons not found from Singapore.

I think having swift-repl manually set X-Timestamp is doable now. It would work kind of like rsync can in that regard. This also works better when the direction is switched. Right now, I assume the codfw files tend to have higher timestamps, so switching would cause pointless writes due to the new source cluster having higher timestamped files than the new destination cluster. Since the timestamp is already stored anyway, this wouldn't add any metadata.

Mon, Feb 10, 8:48 PM · Performance-Team (Radar), User-fgiunchedi, Structured-Data-Backlog, Structured Data Engineering, Multimedia, MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Patch-For-Review, Commons, MediaWiki-File-management, SRE-swift-storage, Traffic, Operations

Sat, Feb 8

aaron added a project to T244633: Constant inheritance not detected in phan, blocking merge of a patch: Performance-Team.
Sat, Feb 8, 4:43 AM · MW-1.35-notes (1.35.0-wmf.21; 2020-02-25), Patch-For-Review, Upstream, phan, Performance-Team, Release-Engineering-Team (CI & Testing services)
aaron created T244633: Constant inheritance not detected in phan, blocking merge of a patch.
Sat, Feb 8, 4:43 AM · MW-1.35-notes (1.35.0-wmf.21; 2020-02-25), Patch-For-Review, Upstream, phan, Performance-Team, Release-Engineering-Team (CI & Testing services)
aaron added a comment to T244058: Wiki diffs take over 15s to load.

Links to old (non-current) versions due not use the parser cache. This means that rendering will always require a full parse.

[CUT]

Some sort of parser caching could be considered for old links that get high traffic. We do not want to waste space on pcXXXX mariadb servers nor LRU flood memcached with large blobs, so there would have to be some "hotness" estimation logic, like a hitcounter.

Instead of caching, we should just rate-limit parsing of old revisions to N concurrent revisions per user or IP, probably via poolcounter.

Sat, Feb 8, 4:31 AM · Performance-Team, Core Platform Team Workboards (Clinic Duty Team), serviceops, Operations, Wikimedia-production-error

Thu, Feb 6

aaron committed rEHIE10cfe67a3719: Convert $wgMemc use to WANObjectCache (authored by aaron).
Convert $wgMemc use to WANObjectCache
Thu, Feb 6, 8:39 PM

Wed, Feb 5

aaron added a comment to T243598: Set tab-width in the base ruleset file.

I think https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/563289/ is running into this.

Wed, Feb 5, 9:35 AM · MediaWiki-Codesniffer

Tue, Feb 4

aaron added a comment to T236800: Ensure apcu incr/decr are atomic (Upgrade php-apcu).

@Krinkle we can push the new package to the canaries this week if you are ok with it

Tue, Feb 4, 8:09 PM · Performance-Team (Radar), MediaWiki-Cache, Core Platform Team, serviceops
aaron added a comment to T244058: Wiki diffs take over 15s to load.

Links to old (non-current) versions due not use the parser cache. This means that rendering will always require a full parse.

Tue, Feb 4, 7:38 PM · Performance-Team, Core Platform Team Workboards (Clinic Duty Team), serviceops, Operations, Wikimedia-production-error

Mon, Feb 3

aaron added a project to T243872: Slow random image queries in MachineVision: Performance-Team.
Mon, Feb 3, 8:50 PM · Product-Infrastructure-Team-Backlog (Kanban), Patch-For-Review, Performance-Team (Radar), Structured-Data-Backlog, MachineVision, Performance Issue

Wed, Jan 29

aaron created T243872: Slow random image queries in MachineVision.
Wed, Jan 29, 1:26 AM · Product-Infrastructure-Team-Backlog (Kanban), Patch-For-Review, Performance-Team (Radar), Structured-Data-Backlog, MachineVision, Performance Issue

Tue, Jan 28

aaron added a comment to T235455: Resolve arclamp disk exhaustion problem (Oct 2019).

I compressed a sample log file from today to see what kind of compression ratios we could get:

Tue, Jan 28, 11:51 PM · serviceops, Performance-Team, Arc-Lamp
aaron added a comment to T243619: Consider disallowing db->update() without condition.

IMO we should never encourage use of query(), that's a code smell for probable MySQLisms.

Tue, Jan 28, 10:37 PM · Performance-Team (Radar), Core Platform Team, Wikimedia-Rdbms
aaron added a comment to T243619: Consider disallowing db->update() without condition.

Not having conditions also implies not having a [...] determinstic outcome

How so? Our batching is generally for performance (i.e. avoiding locking the DB while updating millions of rows), as far as I know, not non-determinism.

Replication safety, determinism, query performance, and lock contention avoidance do often go hand-in-hand in my experience.
For a table like thing (id, foo, bar, baz) a write query that does not utilize a primary key, like UPDATE thing SET bar=42 WHERE bar=x AND baz=y is as I understand it considered an anti-pattern by our DBAs. In part because they are not replication-safe and indeed not (obviously) deterministic. I believe it would be a improvement to change such query to first SELECT the identifiers and then perform the write based by primary key. (Even without batching, although batching might make it even better still.)

Tue, Jan 28, 10:35 PM · Performance-Team (Radar), Core Platform Team, Wikimedia-Rdbms
aaron moved T123582: Use "preconnect" resource hint for thumbnail host from Radar to Inbox on the Performance-Team board.
Tue, Jan 28, 10:19 PM · Patch-For-Review, Performance-Team, MediaWiki-General, Multimedia

Jan 23 2020

aaron closed T222691: Error while file deletion: "Explicit transaction still active." as Resolved.

Not seeing this in the logs anymore.

Jan 23 2020, 9:29 AM · Growth-Team, MediaWiki-Page-deletion, Wikimedia-production-error
aaron created T243492: RunSingleJob: MediaWiki::restInPeace: transaction round 'LinksUpdate::doUpdate' still running.
Jan 23 2020, 9:18 AM · WMF-JobQueue, Performance-Team

Jan 21 2020

aaron moved T243149: Increased latency in CODFW API and APP monitoring urls (~07:20 UTC 19 Jan 2020) from Inbox to Radar on the Performance-Team board.
Jan 21 2020, 9:25 PM · Performance-Team (Radar), serviceops, Operations
aaron added a comment to T243149: Increased latency in CODFW API and APP monitoring urls (~07:20 UTC 19 Jan 2020).

What user impact did it cause?

Jan 21 2020, 6:41 PM · Performance-Team (Radar), serviceops, Operations

Jan 20 2020

aaron added a comment to T243149: Increased latency in CODFW API and APP monitoring urls (~07:20 UTC 19 Jan 2020).

As long as there are any health checks that hit MediaWiki in codfw that involve DB access (pretty much any normal/special page view), then LoadMonitor::getServerStates is reachable (in the course of picking a DB to connect to). That seems expected to me.

Jan 20 2020, 11:49 PM · Performance-Team (Radar), serviceops, Operations

Jan 14 2020

aaron added a comment to T183993: Fix Warning: According to a SiteLinkLookup Q-item is linked when actually is unlinked or non-existent.

If they are more than one case ( I checked and it seems it's two cases in the past 7 days) We need to make sure tools or Wikibase itself put some time between edits (I think this is made using Wikibase itself) We should ask user what they used that made this happen.

Shouldn't chronology protector help us here?

Jan 14 2020, 11:28 PM · User-Addshore, Wikidata-Campsite, Wikidata

Jan 13 2020

aaron moved T129093: SHOW SLAVE STATUS as a health check should have a low timeout from Doing to Backlog: Small & Maintenance on the Performance-Team board.
Jan 13 2020, 9:21 PM · Patch-For-Review, Performance-Team, DBA, Wikimedia-Rdbms
aaron closed T229266: Language::uc/lc return type correctness and perf review as Resolved.
Jan 13 2020, 9:19 PM · MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), Performance-Team, Core Platform Team Workboards (Clinic Duty Team), MediaWiki-Internationalization

Jan 3 2020

aaron added a comment to T226337: SpecialConfirmEmail causes "MWException: CAS update failed on user_touched" from User.php.

These errors seem to be for Special:ConfirmEmail but the patch was for Special:ChangeEmail.

Jan 3 2020, 12:10 AM · Performance-Team (Radar), Core Platform Team, Availability, Wikimedia-production-error, MediaWiki-User-preferences

Dec 21 2019

Pppery awarded T235957: Change {{REVISIONID}} from number to "-" in wgMiserMode a Dislike token.
Dec 21 2019, 11:18 PM · MW-1.35-notes (1.35.0-wmf.21; 2020-02-25), Performance-Team, User-notice, Parsing-Team, MediaWiki-Parser

Dec 5 2019

aaron added a comment to T237477: Redis: Add support for TLS.

The RedisConnectionPool patch idea seems reasonable to me.

Dec 5 2019, 8:06 AM · Core Platform Team Workboards (Clinic Duty Team), MediaWiki-Cache, Patch-For-Review, Performance-Team (Radar)

Dec 2 2019

aaron moved T239095: Undeleting pages with many revision results in DB exceptions from Inbox to Radar on the Performance-Team board.
Dec 2 2019, 8:54 PM · Performance-Team (Radar), Growth-Team, Core Platform Team, MediaWiki-Page-deletion

Nov 27 2019

aaron added a comment to T238493: Frontend save timing regression on/after 30 October 2019.

@aaron have you started looking into this? Any leads?

Nov 27 2019, 7:33 PM · Wikimedia-Incident, Performance-Team

Nov 13 2019

aaron added a comment to T230813: Performance review for the MachineVision extension.

Whoever picks this up, please ping me for access to the required private repo.

I thought maybe the ticket was missing some updates that were communicated elsewhere, but from what I understand our team was unable to install the extension. The review thus-far covered the backend and was based on static review of the code. Next steps is to figure out a minimal way to install (part of) it for frontend review.

Nov 13 2019, 4:55 PM · Structured-Data-Backlog, Product-Infrastructure-Team-Backlog, MW-1.35-notes (1.35.0-wmf.8; 2019-11-26), Patch-For-Review, Performance-Team, MachineVision
aaron added a comment to T237708: Audit and improve page parsing time (2020?).

Are there any cache busting user preferences at play here?

Nov 13 2019, 8:44 AM · Parsing-Team, Performance-Team

Nov 4 2019

aaron added a project to T236412: Refactor BagOStuff to use a more storage/multi-DC aware interface hierarchy: Core Platform Team Workboards (Clinic Duty Team).
Nov 4 2019, 8:48 PM · Core Platform Team, Patch-For-Review, MediaWiki-Cache, Performance-Team

Oct 30 2019

aaron added a comment to T230813: Performance review for the MachineVision extension.

Aside from the things mentioned in the above patch, the overall code looks OK to me.

Oct 30 2019, 5:22 PM · Structured-Data-Backlog, Product-Infrastructure-Team-Backlog, MW-1.35-notes (1.35.0-wmf.8; 2019-11-26), Patch-For-Review, Performance-Team, MachineVision
aaron added a comment to T231086: Picture from Commons not found from Singapore.

swiftrepl is puppetized now to run an eqiad -> codfw sync once a week on Monday (without deletes).

Oct 30 2019, 3:33 PM · Performance-Team (Radar), User-fgiunchedi, Structured-Data-Backlog, Structured Data Engineering, Multimedia, MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Patch-For-Review, Commons, MediaWiki-File-management, SRE-swift-storage, Traffic, Operations

Oct 24 2019

aaron created T236414: CPT review/work for MediaWiki caching class maintenance ramp-up.
Oct 24 2019, 5:20 PM · Performance-Team (Radar), User-Eevans, Core Platform Team Workboards (Clinic Duty Team)
aaron created T236412: Refactor BagOStuff to use a more storage/multi-DC aware interface hierarchy.
Oct 24 2019, 5:16 PM · Core Platform Team, Patch-For-Review, MediaWiki-Cache, Performance-Team
aaron renamed T235188: Preemptive refresh in getMultiWithSetCallback() and getMultiWithUnionSetCallback() pollutes cache from Some revisions' contents are incorrect in the cache - wrong contents shown in history & diffs to Preemptive refresh in getMultiWithSetCallback() and getMultiWithUnionSetCallback() pollutes cache.
Oct 24 2019, 5:05 PM · MediaWiki-Cache, MW-1.35-notes (1.35.0-wmf.18; 2020-02-04), serviceops, Core Platform Team Workboards (Clinic Duty Team), User-ArielGlenn, Patch-For-Review, affects-translatewiki.net
aaron added a comment to T234455: Decouple simple Memcached interface and support pipelined operations without dependency on PECL.

I just want it on the work board (I had a meeting with Erik/Bill) for tracking object cache review and work (we have the goal of getting CPT more involved in maintenance rather than just myself and Timo).

Oct 24 2019, 7:47 AM · Core Platform Team Workboards (Clinic Duty Team), MediaWiki-Cache, Patch-For-Review, Performance-Team (Radar)
aaron updated subscribers of T235705: Add BagOStuff metrics for read/write operations (bytes, key "class") grouped by type/ID.
Oct 24 2019, 7:44 AM · MediaWiki-Cache, Performance-Team

Oct 23 2019

aaron added a project to T234455: Decouple simple Memcached interface and support pipelined operations without dependency on PECL: Core Platform Team.
Oct 23 2019, 6:16 PM · Core Platform Team Workboards (Clinic Duty Team), MediaWiki-Cache, Patch-For-Review, Performance-Team (Radar)
aaron added a project to T235705: Add BagOStuff metrics for read/write operations (bytes, key "class") grouped by type/ID: Core Platform Team.
Oct 23 2019, 6:16 PM · MediaWiki-Cache, Performance-Team
aaron updated the task description for T235705: Add BagOStuff metrics for read/write operations (bytes, key "class") grouped by type/ID.
Oct 23 2019, 6:02 PM · MediaWiki-Cache, Performance-Team

Oct 16 2019

aaron created T235705: Add BagOStuff metrics for read/write operations (bytes, key "class") grouped by type/ID.
Oct 16 2019, 7:54 PM · MediaWiki-Cache, Performance-Team
aaron added a comment to T229686: #dbctl: manage 'externalLoads' data.

In db-eqiad/codfw.php we currently provide IP addresses as the values of the keys in externalLoads instead of using hostsByName to translate hostnames to IP, e.g.:

'externalLoads' => [
	# es2
	'cluster24' => [
		'10.64.32.184' => 0, # es1015, C2 11TB 128GB, master
		'10.64.0.6'    => 1, # es1011, A2 11TB 128GB
		'10.64.16.186' => 1, # es1013, B1 11TB 128GB
	],
]

However looking at the code path, I don't think there's any reason why this has to be so. I see in LBFactoryMulti.php that newExternalLB calls newLoadBalancer which calls makeServerArray which implements the translation: $serverInfo['host'] = $this->hostsByName[$serverName] ?? $serverName;
So I'm planning on emitting hostnames when I implement externalLoads output in dbctl, but wanted to verify with @Krinkle and @aaron that I understood the situation correctly. @Krinkle also suggested that if we are going to rely on this behavior, then it should be explicitly tested in mediawiki-core.

Oct 16 2019, 3:08 AM · Performance-Team, DBA, conftool

Oct 11 2019

aaron closed T224422: Implement logic to filter bogus GTIDs, a subtask of T221159: FY18/19 TEC1.6 Q4: Improve or replace the usage of GTID_WAIT with pt-heartbeat in MW, as Resolved.
Oct 11 2019, 4:26 PM · Performance-Team (Radar), User-mobrovac, Services (watching), Goal, Wikimedia-Rdbms, DBA
aaron closed T224422: Implement logic to filter bogus GTIDs as Resolved.

So this is the latest news:
On T234948 we found that special page updates can lead to a constant 1-2 seconds of lag for the confluence of large long running query with heavy load from the many s8 edits + ongoing wikibase refactoring. Because the timeout is only 1 second, this means chronology protector will fail frequently. However, because we allow pooling for I think a limit of 5 seconds, the host is not depooled. Also, some jobs probably use the "slow" replica and try to use chronology protector, which leads to lots of errors. Obviously, increasing the timeout cannot be done lightly, as it could cause worse problems.
The good news is that I believe this was deployed successfully and works as intended. The bad news is that there is a weakness on our infrastructure, but I am not sure how to move forward, need suggestions. We can open discussion on another ticket, as this scope seems solved.

Oct 11 2019, 4:26 PM · MW-1.35-notes (1.35.0-wmf.2; 2019-10-15), MW-1.34-notes (1.34.0-wmf.25; 2019-10-01), Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (Multi-DC (TEC1)), Performance-Team, Services (watching), Wikimedia-Rdbms

Oct 10 2019

aaron added a comment to T224422: Implement logic to filter bogus GTIDs.

Indeed the logging is based on the *whole* raw unfiltered position...I should add a logstash key for the filtered one too.

Oct 10 2019, 8:49 PM · MW-1.35-notes (1.35.0-wmf.2; 2019-10-15), MW-1.34-notes (1.34.0-wmf.25; 2019-10-01), Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (Multi-DC (TEC1)), Performance-Team, Services (watching), Wikimedia-Rdbms

Oct 8 2019

aaron added a comment to T229062: Look into a simple way to have global keys with db-replicated.

@jcrespo @Marostegui What do think of the idea of having another cluster of mysql servers set up just like the parser cache ones? That would be nice from an HA perspective and to avoid adding extra load to any existing DB cluster (e.g. objectcache table of metawiki or extension1)? Traffic would be modest given that it would start out for use for WikimediaEvents, LoginNotify, perhaps AbuseFilter stats too (see https://docs.google.com/document/d/1tX8ekiYb3xYgpNJsmA1SiKqzkWc0F-_E4SGx6BI72vA/edit#heading=h.bdt9mhl3o7k5).

Oct 8 2019, 9:30 PM · Patch-For-Review, Performance-Team (Radar), MediaWiki-Cache

Oct 6 2019

aaron closed T224422: Implement logic to filter bogus GTIDs, a subtask of T221159: FY18/19 TEC1.6 Q4: Improve or replace the usage of GTID_WAIT with pt-heartbeat in MW, as Resolved.
Oct 6 2019, 1:38 AM · Performance-Team (Radar), User-mobrovac, Services (watching), Goal, Wikimedia-Rdbms, DBA
aaron closed T224422: Implement logic to filter bogus GTIDs as Resolved.
Oct 6 2019, 1:38 AM · MW-1.35-notes (1.35.0-wmf.2; 2019-10-15), MW-1.34-notes (1.34.0-wmf.25; 2019-10-01), Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (Multi-DC (TEC1)), Performance-Team, Services (watching), Wikimedia-Rdbms

Oct 2 2019

aaron updated the task description for T234455: Decouple simple Memcached interface and support pipelined operations without dependency on PECL.
Oct 2 2019, 9:38 PM · Core Platform Team Workboards (Clinic Duty Team), MediaWiki-Cache, Patch-For-Review, Performance-Team (Radar)

Sep 30 2019

aaron moved T224422: Implement logic to filter bogus GTIDs from Backlog: Future Goals to Doing on the Performance-Team board.
Sep 30 2019, 8:42 PM · MW-1.35-notes (1.35.0-wmf.2; 2019-10-15), MW-1.34-notes (1.34.0-wmf.25; 2019-10-01), Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (Multi-DC (TEC1)), Performance-Team, Services (watching), Wikimedia-Rdbms
aaron moved T218692: read only on mediawiki generates "LoadBalancer.php: Cannot access the database: Unknown error" from Doing to Backlog: Small & Maintenance on the Performance-Team board.
Sep 30 2019, 8:42 PM · Performance-Team, Core Platform Team Legacy (Watching / External), WMF-JobQueue, Wikimedia-Rdbms
aaron closed T228092: Hundreds of "PHP Warning: mysqli::query(): MySQL server has gone away" from the same web request as Resolved.

Not seeing this in the logs anymore.

Sep 30 2019, 8:41 PM · MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), Performance-Team, Wikimedia-Rdbms, Wikimedia-production-error
aaron added a comment to T157651: sql.php runs LoadExtensionSchemaUpdates.

@aaron are you requesting code review from Core Platform or do you need something else?

Sep 30 2019, 8:05 PM · Core Platform Team Workboards (Clinic Duty Team), Wikimedia-database-error, Patch-For-Review, Core Platform Team Legacy (Watching / External), Performance-Team, MediaWiki-Maintenance-scripts, Beta-Cluster-reproducible

Sep 18 2019

aaron added a comment to T233117: MediaWiki with sqlite lacks a CACHE_DB.

Seems like some kind of merge conflict.

Sep 18 2019, 2:21 AM · MW-1.34-notes (1.34.0-wmf.24; 2019-09-24), Performance-Team, MW-1.34-release, SQLite, MediaWiki-Cache, MediaWiki-Installer

Sep 12 2019

aaron closed T231162: DBQueryError from ExternalStoreDB::fetchBlob: Table 'enwiki.blobs' doesn't exist as Resolved.
Sep 12 2019, 7:24 AM · MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), Core Platform Team Workboards (Clinic Duty Team), MediaWiki-Revision-backend, Performance-Team

Sep 11 2019

aaron closed T232618: FlaggedRevs: PHP Notice: Undefined variable: fname as Resolved.
Sep 11 2019, 11:53 PM · MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), Performance-Team, MediaWiki-extensions-FlaggedRevs, Wikimedia-production-error

Sep 10 2019

aaron added a comment to T232487: 1.34.0-wmf.22 PHP Warning: curl_multi_setopt():Invalid curl multi configuration option.

Odd, the constant seems to be there.

Sep 10 2019, 11:32 PM · MW-1.34-notes (1.34.0-wmf.23; 2019-09-17), Core Platform Team Workboards (Clinic Duty Team), MediaWiki-General, Analytics, Wikimedia-production-error
aaron added a comment to T218207: Use disk-based LCStore by default in MediaWiki 1.35.

Interesting. In the specific case of SQLite, "cache in database" and "cache on disk" are effectively both use the disk. Some quick comparisons using Quick MediaWiki to install MediaWiki with SQLite (macOS, on-disk /private/tmp/quickmw, PHP 7.1.26 from Homebrew).

SQLite (default installation)
LocalSettings.php (generated)
$wgLocalisationCacheConf['storeServer'] = [
	'type' => 'sqlite',
	'dbname' => "{$wgDBname}_l10n_cache",
	'tablePrefix' => '',
	'variables' => [ 'synchronous' => 'NORMAL' ],
	'dbDirectory' => $wgSQLiteDataDir,
	'trxMode' => 'IMMEDIATE',
	'flags' => 0
];
Test: Deploy one language
time php maintenance/rebuildLocalisationCache.php --lang de --force
real	0m0.404s, 0m0.416s, 0m0.407s

! StaticArray

LocalSettings.php (appendix)
$wgCacheDirectory = $wgSQLiteDataDir;
$wgLocalisationCacheConf['store'] = 'array';
Test: Deploy one language
real	0m0.140s, 0m0.156s, 0m0.148s

Looks like Static Array beats SQLite as well. We've shown in all previous benchmarks that the "All languages" and "Page load time" use cases always align with the "One language" use case, so I won't bother re-running those. Besides, I don't think this would inform our decision here, as I don't think we should optimise the stock MW default for SQLite against MySQL and other RDBMS'es.
In the unlikely event someone finds that sqlite3-based writing or reading outperforms opcache-backed arrays, it will still work by default, and can be optimised by setting wgLocalisationCacheConf directly.

Sep 10 2019, 2:53 AM · Core Platform Team Workboards (Clinic Duty Team), Performance-Team (Radar), MW-1.35-release, Language-Team, MediaWiki-Internationalization

Sep 9 2019

aaron placed T157651: sql.php runs LoadExtensionSchemaUpdates up for grabs.
Sep 9 2019, 9:42 PM · Core Platform Team Workboards (Clinic Duty Team), Wikimedia-database-error, Patch-For-Review, Core Platform Team Legacy (Watching / External), Performance-Team, MediaWiki-Maintenance-scripts, Beta-Cluster-reproducible
aaron added a comment to T157651: sql.php runs LoadExtensionSchemaUpdates.

So, getting this test merged depends on redoing the wikibase schema hook application order for update.php. In CI, there seems to be a problem when it interacts with Flow hooks trying to make pages.

Sep 9 2019, 9:42 PM · Core Platform Team Workboards (Clinic Duty Team), Wikimedia-database-error, Patch-For-Review, Core Platform Team Legacy (Watching / External), Performance-Team, MediaWiki-Maintenance-scripts, Beta-Cluster-reproducible

Sep 5 2019

aaron added a project to T231162: DBQueryError from ExternalStoreDB::fetchBlob: Table 'enwiki.blobs' doesn't exist: Core Platform Team Workboards (Clinic Duty Team).
Sep 5 2019, 6:06 PM · MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), Core Platform Team Workboards (Clinic Duty Team), MediaWiki-Revision-backend, Performance-Team
aaron created T232128: Make MultiHttpClient use CURLMOPT_MAX_HOST_CONNECTIONS and reuse connections.
Sep 5 2019, 5:41 PM · MW-1.35-notes (1.35.0-wmf.1; 2019-10-08), MediaWiki-libs-HTTP, Performance-Team (Radar), Core Platform Team Workboards (Clinic Duty Team)
aaron closed T227838: Obsessive serverIsReadOnly() checking in MySQL as Resolved.

Should be fixed now.

Sep 5 2019, 5:32 PM · MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Performance-Team, Wikimedia-Rdbms

Aug 30 2019

aaron awarded T230979: CR+2 on MediaWiki for Aryeh Gregor (aka Simetrical) a Like token.
Aug 30 2019, 4:26 PM · MediaWiki-Gerrit-Group-Requests
aaron added a comment to T231443: Uncaught Wikimedia\Rdbms\DBUnexpectedError: Wikimedia\Rdbms\Database::close: mass commit/rollback of peer transaction required (DBO_TRX set).

It looks like WebStart.php sets ignore_user_abort() for POSTS and the major entry points have wfTransactionalTimeLimit() set for POSTS. In the case of module_deps updates for load.php, that's on GET.

Aug 30 2019, 5:36 AM · affects-translatewiki.net, MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Language-Team (Language-2019-July-September), MediaWiki-General

Aug 29 2019

aaron added a comment to T231443: Uncaught Wikimedia\Rdbms\DBUnexpectedError: Wikimedia\Rdbms\Database::close: mass commit/rollback of peer transaction required (DBO_TRX set).

Client disconnects (HTTP 499) are interesting...before the ignore_user_abort() in doPostOutputShutdown(), I suppose it's possible to end up with stuff like this (and long has been). https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/519741/ would help this particular case by avoiding DB writes.

Aug 29 2019, 5:49 AM · affects-translatewiki.net, MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Language-Team (Language-2019-July-September), MediaWiki-General
aaron added a comment to T231086: Picture from Commons not found from Singapore.
Aug 29 2019, 5:10 AM · Performance-Team (Radar), User-fgiunchedi, Structured-Data-Backlog, Structured Data Engineering, Multimedia, MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Patch-For-Review, Commons, MediaWiki-File-management, SRE-swift-storage, Traffic, Operations
aaron added a comment to T231443: Uncaught Wikimedia\Rdbms\DBUnexpectedError: Wikimedia\Rdbms\Database::close: mass commit/rollback of peer transaction required (DBO_TRX set).

I wonder if some entry point lacks proper shutdown.

Aug 29 2019, 4:04 AM · affects-translatewiki.net, MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Language-Team (Language-2019-July-September), MediaWiki-General

Aug 28 2019

aaron committed rEGRAedacbd233f24: Rely on ParserCache instead of using $wgMainStash in a flakey way (authored by aaron).
Rely on ParserCache instead of using $wgMainStash in a flakey way
Aug 28 2019, 3:46 PM
aaron created T231461: MediaWiki\Tests\Storage\NameTableStoreTest::testCacheRaceCondition failure.
Aug 28 2019, 3:12 PM · MediaWiki-Revision-backend
aaron added a comment to T227838: Obsessive serverIsReadOnly() checking in MySQL.

What is the value of apc.enable_cli ? I don't seem to have that problem.

Aug 28 2019, 2:27 PM · MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Performance-Team, Wikimedia-Rdbms
aaron added a comment to T231110: bring swiftrepl back to life.

I do worry about the risk of data loss if swiftrepl is also deleting files based on container list differences.

Aug 28 2019, 8:02 AM · User-fgiunchedi, Commons, MediaWiki-File-management, SRE-swift-storage, Operations

Aug 26 2019

aaron added a comment to T218555: Provide access to WebRequest and associated information via a service object.

I'd love to have a simplified version of WebRequest as a service. One that would be useful for dealing with the issue that https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/532367/ is about. Optimization hacks like https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/526801/ could be avoided too. It could be injected with pathinfo/cookie settings, but would not deal with complex encoding stuff that uses $wgContLang and so on.

Aug 26 2019, 3:46 PM · TechCom, MediaWiki-ServiceContainer, CPT Initiatives (Decoupling (CDP2))
aaron closed T227838: Obsessive serverIsReadOnly() checking in MySQL as Resolved.
Aug 26 2019, 3:57 AM · MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Performance-Team, Wikimedia-Rdbms

Aug 25 2019

aaron closed T202116: LoadBalancer opening extra connections in different connection categories doesn't work with PHPUnit & temporary tables as Resolved.
Aug 25 2019, 9:37 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Performance-Team, MediaWiki-Core-Testing, Wikimedia-Rdbms, User-Addshore
aaron closed T225103: LBFactory destructor causes unexpected exception at shutdown as Resolved.
Aug 25 2019, 1:18 AM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Wikimedia-Rdbms, Performance-Team, Wikimedia-production-error

Aug 23 2019

aaron added a comment to T231086: Picture from Commons not found from Singapore.

Still, a file was only uploaded, and no other operations done...I'm not sure why the DB would commit if the file store failed in one of the FileBackendMultiwrite backends and 'replication' is 'sync'...

Aug 23 2019, 3:44 PM · Performance-Team (Radar), User-fgiunchedi, Structured-Data-Backlog, Structured Data Engineering, Multimedia, MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Patch-For-Review, Commons, MediaWiki-File-management, SRE-swift-storage, Traffic, Operations
aaron added a comment to T231086: Picture from Commons not found from Singapore.

Isn't there a swiftrepl background process to fix this?

Aug 23 2019, 3:01 PM · Performance-Team (Radar), User-fgiunchedi, Structured-Data-Backlog, Structured Data Engineering, Multimedia, MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Patch-For-Review, Commons, MediaWiki-File-management, SRE-swift-storage, Traffic, Operations

Aug 22 2019

aaron added a comment to T227758: [investigate] purging strategy.

Note that CdnCacheUpdate queues a purge to happen X seconds later to help deal with lag (mediawiki-config has $wgCdnReboundPurgeDelay at 11). If lag gets near that amount, then $wgCdnMaxageLagged will kick in.

Aug 22 2019, 1:42 AM · Wikidata-Bridge-Sprint-7, Wikidata-Bridge-Sprint-6, Wikidata

Aug 21 2019

aaron closed T229694: Warning: EchoModerationController::moderate: transaction round 'MWCallableUpdate::doUpdate' still running as Resolved.
Aug 21 2019, 8:22 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Core Platform Team Workboards (Clinic Duty Team), Regression, MediaWiki-General, Performance-Team
aaron closed T229456: Enable MYSQLI_CLIENT_FOUND_ROWS option for consistency with other RDBMS backends as Resolved.

This is significantly less useful than the old behavior. Affected_rows is typically used to either skip expensive cache purges when nothing actually changed, or signal to the user whether they actually managed to change something. (Why would a caller care about rows matched but not changed?)

Aug 21 2019, 7:25 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Performance-Team (Radar), Core Platform Team Workboards (Clinic Duty Team), Wikimedia-Rdbms
aaron closed T225957: Investigate front end saving timing regression starting April 20, 2019 as Resolved.

Seems to be resolved, likely by vary-revision refactoring from T226785.

Aug 21 2019, 5:47 PM · Performance-Team