Page MenuHomePhabricator

aaron (Aaron Schulz)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Oct 20 2014, 5:25 PM (260 w, 3 d)
Availability
Available
IRC Nick
AaronSchulz
LDAP User
Aaron Schulz
MediaWiki User
Aaron Schulz [ Global Accounts ]

Recent Activity

Yesterday

aaron created T235705: Add BagOStuff metrics for read/write operations (bytes, key "class") grouped by type/ID.
Wed, Oct 16, 7:54 PM · MediaWiki-Cache, Performance-Team
aaron added a comment to T229686: #dbctl: manage 'externalLoads' data.

In db-eqiad/codfw.php we currently provide IP addresses as the values of the keys in externalLoads instead of using hostsByName to translate hostnames to IP, e.g.:

'externalLoads' => [
	# es2
	'cluster24' => [
		'10.64.32.184' => 0, # es1015, C2 11TB 128GB, master
		'10.64.0.6'    => 1, # es1011, A2 11TB 128GB
		'10.64.16.186' => 1, # es1013, B1 11TB 128GB
	],
]

However looking at the code path, I don't think there's any reason why this has to be so. I see in LBFactoryMulti.php that newExternalLB calls newLoadBalancer which calls makeServerArray which implements the translation: $serverInfo['host'] = $this->hostsByName[$serverName] ?? $serverName;
So I'm planning on emitting hostnames when I implement externalLoads output in dbctl, but wanted to verify with @Krinkle and @aaron that I understood the situation correctly. @Krinkle also suggested that if we are going to rely on this behavior, then it should be explicitly tested in mediawiki-core.

Wed, Oct 16, 3:08 AM · Performance-Team, DBA, conftool

Fri, Oct 11

aaron closed T224422: Implement logic to filter bogus GTIDs, a subtask of T221159: FY18/19 TEC1.6 Q4: Improve or replace the usage of GTID_WAIT with pt-heartbeat in MW, as Resolved.
Fri, Oct 11, 4:26 PM · Core Platform Team, Performance-Team (Radar), User-mobrovac, Services (watching), Goal, Core Platform Team Legacy (Watching / External), Wikimedia-Rdbms, DBA
aaron closed T224422: Implement logic to filter bogus GTIDs as Resolved.

So this is the latest news:
On T234948 we found that special page updates can lead to a constant 1-2 seconds of lag for the confluence of large long running query with heavy load from the many s8 edits + ongoing wikibase refactoring. Because the timeout is only 1 second, this means chronology protector will fail frequently. However, because we allow pooling for I think a limit of 5 seconds, the host is not depooled. Also, some jobs probably use the "slow" replica and try to use chronology protector, which leads to lots of errors. Obviously, increasing the timeout cannot be done lightly, as it could cause worse problems.
The good news is that I believe this was deployed successfully and works as intended. The bad news is that there is a weakness on our infrastructure, but I am not sure how to move forward, need suggestions. We can open discussion on another ticket, as this scope seems solved.

Fri, Oct 11, 4:26 PM · MW-1.35-notes (1.35.0-wmf.2; 2019-10-15), MW-1.34-notes (1.34.0-wmf.25; 2019-10-01), Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (Multi-DC (TEC1)), Performance-Team, Services (watching), Wikimedia-Rdbms

Thu, Oct 10

aaron added a comment to T224422: Implement logic to filter bogus GTIDs.

Indeed the logging is based on the *whole* raw unfiltered position...I should add a logstash key for the filtered one too.

Thu, Oct 10, 8:49 PM · MW-1.35-notes (1.35.0-wmf.2; 2019-10-15), MW-1.34-notes (1.34.0-wmf.25; 2019-10-01), Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (Multi-DC (TEC1)), Performance-Team, Services (watching), Wikimedia-Rdbms

Tue, Oct 8

aaron added a comment to T229062: Look into a simple way to have global keys with db-replicated.

@jcrespo @Marostegui What do think of the idea of having another cluster of mysql servers set up just like the parser cache ones? That would be nice from an HA perspective and to avoid adding extra load to any existing DB cluster (e.g. objectcache table of metawiki or extension1)? Traffic would be modest given that it would start out for use for WikimediaEvents, LoginNotify, perhaps AbuseFilter stats too (see https://docs.google.com/document/d/1tX8ekiYb3xYgpNJsmA1SiKqzkWc0F-_E4SGx6BI72vA/edit#heading=h.bdt9mhl3o7k5).

Tue, Oct 8, 9:30 PM · Patch-For-Review, Performance-Team (Radar), MediaWiki-Cache

Sun, Oct 6

aaron closed T224422: Implement logic to filter bogus GTIDs, a subtask of T221159: FY18/19 TEC1.6 Q4: Improve or replace the usage of GTID_WAIT with pt-heartbeat in MW, as Resolved.
Sun, Oct 6, 1:38 AM · Core Platform Team, Performance-Team (Radar), User-mobrovac, Services (watching), Goal, Core Platform Team Legacy (Watching / External), Wikimedia-Rdbms, DBA
aaron closed T224422: Implement logic to filter bogus GTIDs as Resolved.
Sun, Oct 6, 1:38 AM · MW-1.35-notes (1.35.0-wmf.2; 2019-10-15), MW-1.34-notes (1.34.0-wmf.25; 2019-10-01), Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (Multi-DC (TEC1)), Performance-Team, Services (watching), Wikimedia-Rdbms

Wed, Oct 2

aaron updated the task description for T234455: Decouple simple Memcached interface and support pipelined operations without dependency on PECL.
Wed, Oct 2, 9:38 PM · MediaWiki-Cache, Patch-For-Review, Performance-Team (Radar)

Mon, Sep 30

aaron moved T224422: Implement logic to filter bogus GTIDs from Backlog: Future Goals to Doing on the Performance-Team board.
Mon, Sep 30, 8:42 PM · MW-1.35-notes (1.35.0-wmf.2; 2019-10-15), MW-1.34-notes (1.34.0-wmf.25; 2019-10-01), Core Platform Team Workboards (Clinic Duty Team), CPT Initiatives (Multi-DC (TEC1)), Performance-Team, Services (watching), Wikimedia-Rdbms
aaron moved T218692: read only on mediawiki generates "LoadBalancer.php: Cannot access the database: Unknown error" from Doing to Backlog: Small & Maintenance on the Performance-Team board.
Mon, Sep 30, 8:42 PM · Performance-Team, Core Platform Team Legacy (Watching / External), WMF-JobQueue, Wikimedia-Rdbms
aaron closed T228092: Hundreds of "PHP Warning: mysqli::query(): MySQL server has gone away" from the same web request as Resolved.

Not seeing this in the logs anymore.

Mon, Sep 30, 8:41 PM · MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), Performance-Team, Wikimedia-Rdbms, Wikimedia-production-error
aaron added a comment to T157651: sql.php runs LoadExtensionSchemaUpdates.

@aaron are you requesting code review from Core Platform or do you need something else?

Mon, Sep 30, 8:05 PM · Core Platform Team Workboards (Clinic Duty Team), Wikimedia-database-error, Patch-For-Review, Core Platform Team Legacy (Watching / External), Performance-Team, MediaWiki-Maintenance-scripts, Beta-Cluster-reproducible

Wed, Sep 18

aaron added a comment to T233117: MediaWiki with sqlite lacks a CACHE_DB.

Seems like some kind of merge conflict.

Wed, Sep 18, 2:21 AM · MW-1.34-notes (1.34.0-wmf.24; 2019-09-24), Performance-Team, MW-1.34-release, SQLite, MediaWiki-Cache, MediaWiki-Installer

Sep 12 2019

aaron closed T231162: DBQueryError from ExternalStoreDB::fetchBlob: Table 'enwiki.blobs' doesn't exist as Resolved.
Sep 12 2019, 7:24 AM · MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), Core Platform Team Workboards (Clinic Duty Team), MediaWiki-Revision-backend, Performance-Team

Sep 11 2019

aaron closed T232618: FlaggedRevs: PHP Notice: Undefined variable: fname as Resolved.
Sep 11 2019, 11:53 PM · MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), Performance-Team, MediaWiki-extensions-FlaggedRevs, Wikimedia-production-error

Sep 10 2019

aaron added a comment to T232487: 1.34.0-wmf.22 PHP Warning: curl_multi_setopt():Invalid curl multi configuration option.

Odd, the constant seems to be there.

Sep 10 2019, 11:32 PM · MW-1.34-notes (1.34.0-wmf.23; 2019-09-17), Core Platform Team Workboards (Clinic Duty Team), MediaWiki-General, Analytics, Wikimedia-production-error
aaron added a comment to T218207: Use disk-based LCStore by default in MediaWiki 1.35.

Interesting. In the specific case of SQLite, "cache in database" and "cache on disk" are effectively both use the disk. Some quick comparisons using Quick MediaWiki to install MediaWiki with SQLite (macOS, on-disk /private/tmp/quickmw, PHP 7.1.26 from Homebrew).

SQLite (default installation)
LocalSettings.php (generated)
$wgLocalisationCacheConf['storeServer'] = [
	'type' => 'sqlite',
	'dbname' => "{$wgDBname}_l10n_cache",
	'tablePrefix' => '',
	'variables' => [ 'synchronous' => 'NORMAL' ],
	'dbDirectory' => $wgSQLiteDataDir,
	'trxMode' => 'IMMEDIATE',
	'flags' => 0
];
Test: Deploy one language
time php maintenance/rebuildLocalisationCache.php --lang de --force
real	0m0.404s, 0m0.416s, 0m0.407s

! StaticArray

LocalSettings.php (appendix)
$wgCacheDirectory = $wgSQLiteDataDir;
$wgLocalisationCacheConf['store'] = 'array';
Test: Deploy one language
real	0m0.140s, 0m0.156s, 0m0.148s

Looks like Static Array beats SQLite as well. We've shown in all previous benchmarks that the "All languages" and "Page load time" use cases always align with the "One language" use case, so I won't bother re-running those. Besides, I don't think this would inform our decision here, as I don't think we should optimise the stock MW default for SQLite against MySQL and other RDBMS'es.
In the unlikely event someone finds that sqlite3-based writing or reading outperforms opcache-backed arrays, it will still work by default, and can be optimised by setting wgLocalisationCacheConf directly.

Sep 10 2019, 2:53 AM · Performance-Team (Radar), MW-1.35-release, Core Platform Team, Language-Team, MediaWiki-Internationalization

Sep 9 2019

aaron placed T157651: sql.php runs LoadExtensionSchemaUpdates up for grabs.
Sep 9 2019, 9:42 PM · Core Platform Team Workboards (Clinic Duty Team), Wikimedia-database-error, Patch-For-Review, Core Platform Team Legacy (Watching / External), Performance-Team, MediaWiki-Maintenance-scripts, Beta-Cluster-reproducible
aaron added a comment to T157651: sql.php runs LoadExtensionSchemaUpdates.

So, getting this test merged depends on redoing the wikibase schema hook application order for update.php. In CI, there seems to be a problem when it interacts with Flow hooks trying to make pages.

Sep 9 2019, 9:42 PM · Core Platform Team Workboards (Clinic Duty Team), Wikimedia-database-error, Patch-For-Review, Core Platform Team Legacy (Watching / External), Performance-Team, MediaWiki-Maintenance-scripts, Beta-Cluster-reproducible

Sep 5 2019

aaron added a project to T231162: DBQueryError from ExternalStoreDB::fetchBlob: Table 'enwiki.blobs' doesn't exist: Core Platform Team Workboards (Clinic Duty Team).
Sep 5 2019, 6:06 PM · MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), Core Platform Team Workboards (Clinic Duty Team), MediaWiki-Revision-backend, Performance-Team
aaron created T232128: Make MultiHttpClient use CURLMOPT_MAX_HOST_CONNECTIONS and reuse connections.
Sep 5 2019, 5:41 PM · MW-1.35-notes (1.35.0-wmf.1; 2019-10-08), MediaWiki-libs-HTTP, Performance-Team (Radar), Core Platform Team Workboards (Clinic Duty Team)
aaron closed T227838: Obsessive serverIsReadOnly() checking in MySQL as Resolved.

Should be fixed now.

Sep 5 2019, 5:32 PM · MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Performance-Team, Wikimedia-Rdbms

Aug 30 2019

aaron awarded T230979: CR+2 on MediaWiki for Aryeh Gregor (aka Simetrical) a Like token.
Aug 30 2019, 4:26 PM · MediaWiki-Gerrit-Group-Requests
aaron added a comment to T231443: Uncaught Wikimedia\Rdbms\DBUnexpectedError: Wikimedia\Rdbms\Database::close: mass commit/rollback of peer transaction required (DBO_TRX set).

It looks like WebStart.php sets ignore_user_abort() for POSTS and the major entry points have wfTransactionalTimeLimit() set for POSTS. In the case of module_deps updates for load.php, that's on GET.

Aug 30 2019, 5:36 AM · affects-translatewiki.net, MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Language-Team (Language-2019-July-September), MediaWiki-General

Aug 29 2019

aaron added a comment to T231443: Uncaught Wikimedia\Rdbms\DBUnexpectedError: Wikimedia\Rdbms\Database::close: mass commit/rollback of peer transaction required (DBO_TRX set).

Client disconnects (HTTP 499) are interesting...before the ignore_user_abort() in doPostOutputShutdown(), I suppose it's possible to end up with stuff like this (and long has been). https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/519741/ would help this particular case by avoiding DB writes.

Aug 29 2019, 5:49 AM · affects-translatewiki.net, MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Language-Team (Language-2019-July-September), MediaWiki-General
aaron added a comment to T231086: Picture from Commons not found from Singapore.
Aug 29 2019, 5:10 AM · User-fgiunchedi, Structured-Data-Backlog, Structured Data Engineering, Multimedia, MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Patch-For-Review, Commons, MediaWiki-File-management, media-storage, Traffic, Operations
aaron added a comment to T231443: Uncaught Wikimedia\Rdbms\DBUnexpectedError: Wikimedia\Rdbms\Database::close: mass commit/rollback of peer transaction required (DBO_TRX set).

I wonder if some entry point lacks proper shutdown.

Aug 29 2019, 4:04 AM · affects-translatewiki.net, MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Language-Team (Language-2019-July-September), MediaWiki-General

Aug 28 2019

aaron committed rEGRAedacbd233f24: Rely on ParserCache instead of using $wgMainStash in a flakey way (authored by aaron).
Rely on ParserCache instead of using $wgMainStash in a flakey way
Aug 28 2019, 3:46 PM
aaron created T231461: MediaWiki\Tests\Storage\NameTableStoreTest::testCacheRaceCondition failure.
Aug 28 2019, 3:12 PM · MediaWiki-Revision-backend
aaron added a comment to T227838: Obsessive serverIsReadOnly() checking in MySQL.

What is the value of apc.enable_cli ? I don't seem to have that problem.

Aug 28 2019, 2:27 PM · MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Performance-Team, Wikimedia-Rdbms
aaron added a comment to T231110: bring swiftrepl back to life.

I do worry about the risk of data loss if swiftrepl is also deleting files based on container list differences.

Aug 28 2019, 8:02 AM · User-fgiunchedi, Commons, MediaWiki-File-management, media-storage, Operations

Aug 26 2019

aaron added a comment to T218555: Provide access to WebRequest and associated information via a service object.

I'd love to have a simplified version of WebRequest as a service. One that would be useful for dealing with the issue that https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/532367/ is about. Optimization hacks like https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/526801/ could be avoided too. It could be injected with pathinfo/cookie settings, but would not deal with complex encoding stuff that uses $wgContLang and so on.

Aug 26 2019, 3:46 PM · TechCom, MediaWiki-ServiceContainer, CPT Initiatives (Decoupling (CDP2))
aaron closed T227838: Obsessive serverIsReadOnly() checking in MySQL as Resolved.
Aug 26 2019, 3:57 AM · MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Performance-Team, Wikimedia-Rdbms

Aug 25 2019

aaron closed T202116: LoadBalancer opening extra connections in different connection categories doesn't work with PHPUnit & temporary tables as Resolved.
Aug 25 2019, 9:37 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Performance-Team, MediaWiki-Core-Testing, Wikimedia-Rdbms, User-Addshore
aaron closed T225103: LBFactory destructor causes unexpected exception at shutdown as Resolved.
Aug 25 2019, 1:18 AM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Wikimedia-Rdbms, Performance-Team, Wikimedia-production-error

Aug 23 2019

aaron added a comment to T231086: Picture from Commons not found from Singapore.

Still, a file was only uploaded, and no other operations done...I'm not sure why the DB would commit if the file store failed in one of the FileBackendMultiwrite backends and 'replication' is 'sync'...

Aug 23 2019, 3:44 PM · User-fgiunchedi, Structured-Data-Backlog, Structured Data Engineering, Multimedia, MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Patch-For-Review, Commons, MediaWiki-File-management, media-storage, Traffic, Operations
aaron added a comment to T231086: Picture from Commons not found from Singapore.

Isn't there a swiftrepl background process to fix this?

Aug 23 2019, 3:01 PM · User-fgiunchedi, Structured-Data-Backlog, Structured Data Engineering, Multimedia, MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Patch-For-Review, Commons, MediaWiki-File-management, media-storage, Traffic, Operations

Aug 22 2019

aaron added a comment to T227758: [investigate] purging strategy.

Note that CdnCacheUpdate queues a purge to happen X seconds later to help deal with lag (mediawiki-config has $wgCdnReboundPurgeDelay at 11). If lag gets near that amount, then $wgCdnMaxageLagged will kick in.

Aug 22 2019, 1:42 AM · Wikidata-Bridge-Sprint-7, Wikidata-Bridge-Sprint-6, Wikidata

Aug 21 2019

aaron closed T229694: Warning: EchoModerationController::moderate: transaction round 'MWCallableUpdate::doUpdate' still running as Resolved.
Aug 21 2019, 8:22 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Core Platform Team Workboards (Clinic Duty Team), Regression, MediaWiki-General, Performance-Team
aaron closed T229456: Enable MYSQLI_CLIENT_FOUND_ROWS option for consistency with other RDBMS backends as Resolved.

This is significantly less useful than the old behavior. Affected_rows is typically used to either skip expensive cache purges when nothing actually changed, or signal to the user whether they actually managed to change something. (Why would a caller care about rows matched but not changed?)

Aug 21 2019, 7:25 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Performance-Team (Radar), Core Platform Team Workboards (Clinic Duty Team), Wikimedia-Rdbms
aaron closed T225957: Investigate front end saving timing regression starting April 20, 2019 as Resolved.

Seems to be resolved, likely by vary-revision refactoring from T226785.

Aug 21 2019, 5:47 PM · Performance-Team

Aug 20 2019

aaron closed T216496: Misleading "replica catching up" error when master DB is down as Resolved.
Aug 20 2019, 6:01 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Performance-Team, patch-welcome, Wikimedia-Rdbms

Aug 19 2019

aaron committed rECAC5cc799b00759: Switch to using BagOStuff::incrWithInit() (authored by aaron).
Switch to using BagOStuff::incrWithInit()
Aug 19 2019, 10:48 PM

Aug 17 2019

aaron added a comment to T230065: DBQueryError "Commands out of sync" from Rdbms\Database::close.

I don't think so.

Aug 17 2019, 7:05 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Performance-Team, Wikimedia-Rdbms, Wikimedia-production-error
aaron closed T226785: Phase out use of vary-revision with more specific flags and improve related logging as Resolved.
Aug 17 2019, 1:17 AM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Core Platform Team Workboards (Clinic Duty Team), Performance-Team, MediaWiki-Parser

Aug 15 2019

aaron added a comment to T229566: BagOStuff InvalidArgumentException from line 710.

Does this still occur?

Aug 15 2019, 2:35 PM · affects-translatewiki.net, Performance-Team, MediaWiki-Cache

Aug 12 2019

aaron moved T229694: Warning: EchoModerationController::moderate: transaction round 'MWCallableUpdate::doUpdate' still running from Doing to Blocked or Needs-CR on the Performance-Team board.
Aug 12 2019, 7:47 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Core Platform Team Workboards (Clinic Duty Team), Regression, MediaWiki-General, Performance-Team
aaron moved T230025: Create HtmlCacheUpdater service class to normalize purging code from Doing to Blocked or Needs-CR on the Performance-Team board.
Aug 12 2019, 7:46 PM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, User-Daniel, Performance-Team
aaron moved T202116: LoadBalancer opening extra connections in different connection categories doesn't work with PHPUnit & temporary tables from Inbox to Backlog: Small & Maintenance on the Performance-Team board.
Aug 12 2019, 7:45 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Performance-Team, MediaWiki-Core-Testing, Wikimedia-Rdbms, User-Addshore
aaron moved T216496: Misleading "replica catching up" error when master DB is down from Inbox to Doing on the Performance-Team board.
Aug 12 2019, 7:44 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Performance-Team, patch-welcome, Wikimedia-Rdbms
aaron closed T228525: If JobQueueEventBus fails to send a job exception is left uncaught, a subtask of T225199: Fatal error during RecentChange::notifyEdit (deferred update) from ORES/RecentChangeSaveHookHandler, as Invalid.
Aug 12 2019, 7:43 PM · Growth-Team (Current Sprint), MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Scoring-platform-team, WMF-JobQueue, ORES, Wikimedia-production-error
aaron closed T228525: If JobQueueEventBus fails to send a job exception is left uncaught as Invalid.

Per my comment above, this is the expected behavior.

Aug 12 2019, 7:43 PM · Performance-Team, Core Platform Team Workboards (Clinic Duty Team), Core Platform Team (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), WMF-JobQueue, Wikimedia-production-error
aaron moved T230025: Create HtmlCacheUpdater service class to normalize purging code from Inbox to Doing on the Performance-Team board.
Aug 12 2019, 7:43 PM · Core Platform Team Workboards (Clinic Duty Team), Patch-For-Review, User-Daniel, Performance-Team
aaron moved T230037: Create warmup procedure for MediaWiki app servers from Inbox to Backlog: Future Goals on the Performance-Team board.
Aug 12 2019, 7:42 PM · Release-Engineering-Team, serviceops, Performance-Team
aaron moved T230065: DBQueryError "Commands out of sync" from Rdbms\Database::close from Inbox to Doing on the Performance-Team board.
Aug 12 2019, 7:41 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Performance-Team, Wikimedia-Rdbms, Wikimedia-production-error
aaron claimed T230065: DBQueryError "Commands out of sync" from Rdbms\Database::close.
Aug 12 2019, 7:41 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Performance-Team, Wikimedia-Rdbms, Wikimedia-production-error
aaron moved T230260: Page view triggers ResourceLoaderWikiModule db queries for enabled gadgets (from OutputPage) from Inbox to Radar on the Performance-Team board.
Aug 12 2019, 7:39 PM · Performance-Team, MediaWiki-ResourceLoader, MediaWiki-Cache, MediaWiki-extensions-Gadgets
aaron added a comment to T51195: Drop filejournal table from WMF.

It's an optional table, not installed by update.php.

Aug 12 2019, 6:26 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), DBA, Performance-Team (Radar), MediaWiki-File-management

Aug 9 2019

aaron updated subscribers of T229062: Look into a simple way to have global keys with db-replicated.
Aug 9 2019, 4:46 PM · Patch-For-Review, Performance-Team (Radar), MediaWiki-Cache
aaron updated the task description for T229062: Look into a simple way to have global keys with db-replicated.
Aug 9 2019, 4:45 PM · Patch-For-Review, Performance-Team (Radar), MediaWiki-Cache
aaron added a comment to T226167: audit public tables and make sure we dump them all.

They were obsoleted by flaggedrevs_statistics.

Aug 9 2019, 5:54 AM · Patch-For-Review, Dumps-Generation

Aug 8 2019

aaron closed T226432: Investigate use of vary-revision flags on group2 wikis as Resolved.

The remaining vary-revision instances are basic self-transclusions (https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/526157/ should handle those).

Aug 8 2019, 11:03 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Patch-For-Review, Performance-Team
aaron closed T211220: Update Save Timing grafana dashboards to break down by content model as Resolved.
Aug 8 2019, 10:15 PM · Performance-Team

Aug 5 2019

aaron committed rEFLI5f2aaa110ee0: Convert FileImporterSuccessCache to "db-replicated" cache (authored by aaron).
Convert FileImporterSuccessCache to "db-replicated" cache
Aug 5 2019, 3:22 PM

Aug 2 2019

aaron created T229694: Warning: EchoModerationController::moderate: transaction round 'MWCallableUpdate::doUpdate' still running.
Aug 2 2019, 7:53 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), Core Platform Team Workboards (Clinic Duty Team), Regression, MediaWiki-General, Performance-Team

Aug 1 2019

aaron merged T229605: File pages are not created: Fatal exception of type "Wikimedia\Rdbms\DBQueryError" into T229589: PHP Notice: Undefined property: MediaWiki\Revision\RevisionRenderer::$wikiId.
Aug 1 2019, 8:01 PM · MW-1.34-notes (1.34.0-wmf.16; 2019-07-30), Performance-Team, MediaWiki-Revision-backend, Wikisource, Wikimedia-production-error
aaron merged task T229605: File pages are not created: Fatal exception of type "Wikimedia\Rdbms\DBQueryError" into T229589: PHP Notice: Undefined property: MediaWiki\Revision\RevisionRenderer::$wikiId.
Aug 1 2019, 8:01 PM · MediaWiki-Revision-backend, Wikimedia-production-error, video2commons, Commons, Wikimedia-database-error

Jul 31 2019

aaron added a comment to T212881: addWiki.php broken creating ES tables.

Is https://phabricator.wikimedia.org/T212881#5195101 the error that still happens or is it the read-only one too?

Jul 31 2019, 11:34 PM · MW-1.34-notes (1.34.0-wmf.24; 2019-09-24), Performance-Team, Patch-For-Review, MediaWiki-extensions-WikimediaMaintenance
aaron committed rERLS2efa2553f868: Cleanup use of IDatabase::affectedRows() (authored by aaron).
Cleanup use of IDatabase::affectedRows()
Jul 31 2019, 3:56 PM
aaron added a comment to T219592: Frequent Echo DB_MASTER write queries on HTTP GET.

Jobs are fine...though this case is complicated since people want their "latest views" to be immediately reflected...so it would have to do something like WatchedItemStore.

Jul 31 2019, 6:16 AM · CPT Initiatives (Multi-DC (TEC1)), Growth-Team, Notifications, Services (watching), Performance-Team (Radar), Availability (MediaWiki-MultiDC)
aaron added a comment to T212881: addWiki.php broken creating ES tables.

How much of this is unique from T205936 ?

Jul 31 2019, 2:38 AM · MW-1.34-notes (1.34.0-wmf.24; 2019-09-24), Performance-Team, Patch-For-Review, MediaWiki-extensions-WikimediaMaintenance

Jul 27 2019

aaron committed rESRD5898e5267daa: Cleaned up recache() to behave more like the parent method (authored by aaron).
Cleaned up recache() to behave more like the parent method
Jul 27 2019, 11:16 PM

Jul 25 2019

aaron created T229062: Look into a simple way to have global keys with db-replicated.
Jul 25 2019, 9:38 PM · Patch-For-Review, Performance-Team (Radar), MediaWiki-Cache

Jul 23 2019

aaron added a comment to T227401: MediaWiki should query master instead of replica if replica is too lagged.

I wonder if this is fixed in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/519565/

Jul 23 2019, 7:53 PM · Core Platform Team, MW-1.34-release, Wikimedia-Rdbms
aaron closed T212284: Fatal db error "Could not select database 'centralauth'" (sometimes also 'metawiki') as Resolved.

The logs for doSelectDomain() look quite for the last 7 days.

Jul 23 2019, 4:43 PM · Core Platform Team Workboards (Clinic Duty Team), MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), Core Platform Team (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), Performance-Team (Radar), Services (next), Wikimedia-Rdbms, MediaWiki-extensions-CentralAuth, Wikimedia-production-error
aaron added a comment to T228749: AssembleUploadChunksJob triggers: SqlBagOStuff: tries to serialize closure.

959daa2ca44c039e72c8a9a5199d4c74dd05caba added the << $status->value = [ 'warnings' => $upload->checkWarnings() ]; >> line. It seems like checkWarnings() has all kinds of File objects inside of it potentially. Some callback could easily slip in given that.

Jul 23 2019, 4:19 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Core Platform Team (Needs Cleaning - Code Health (TEC13)), Core Platform Team Workboards (Clinic Duty Team), MediaWiki-Uploading, Multimedia, Wikimedia-production-error

Jul 22 2019

aaron added a comment to T212129: Use a multi-dc aware store for ObjectCache's MainStash if needed..

ObjectCache always mentioned getMainStashInstance() as "Ephemeral global storage". It was just supposed to *try harder* to be persistent than memcached (rdb snapshots, expectation that stuff can *probably* still be there a week later or so). The existence of redis evictions and consistent re-hashing on host failure making data disappear or go stale was well known at the time it was picked as the original "stash".

Jul 22 2019, 8:10 PM · CPT Initiatives (Mainstash Multi-DC), MediaWiki-General, serviceops-radar, User-mobrovac, User-jijiki, Performance-Team (Radar), Operations
aaron added a comment to T228465: TorBlock maintenance failures on labweb hosts.

Fixed and confirmed

[0450][www-data@labweb1001:/]$ time /usr/local/bin/mwscript extensions/TorBlock/maintenance/loadExitNodes.php --wiki=labswiki --force
Successfully loaded 1206 exit nodes.
real	0m1.958s
user	0m0.572s
sys	0m0.212s

I guess the cron jobs should be removed now, is there a bug for that?

Jul 22 2019, 7:27 PM · MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Patch-For-Review, MediaWiki-extensions-TorBlock

Jul 19 2019

aaron added a comment to T228525: If JobQueueEventBus fails to send a job exception is left uncaught.

JobQueueException should be thrown from push(), with nothing catching it other than MWExceptionHandler or site-specific callers. Things like RenameUser *depend* on knowing whether something enqueued or not in order to function correctly. Typically, push() should be used pre-send, before preOutputCommit, so everything would just rollback anyway. Jobs pushed after than are enqueued during DeferrableUpdates (directly or indirectly via lazyPush()); in that case, DeferredUpdates should (already) catch any exceptions (not just job queue ones) and rollback on an update-by-update bases. The exceptions are logged in the DeferredUpdates channel (previously the Exception channel).

Jul 19 2019, 7:47 PM · Performance-Team, Core Platform Team Workboards (Clinic Duty Team), Core Platform Team (Needs Cleaning - Security, stability, performance, and scalability (TEC1)), WMF-JobQueue, Wikimedia-production-error
aaron added a comment to T228436: web request timeout after 200 seconds due to Wikimedia\Rdbms\LBFactory->__destruct() > Wikimedia\Rdbms\LBFactory->commitMasterChanges().

Also, the timeout exceptions themselves where redis, not LBFactory. The later seemed to just have errors related to the improper shutdown.

Jul 19 2019, 7:38 PM · Performance-Team, Wikimedia-Rdbms, Wikimedia-production-error
aaron added a comment to T225103: LBFactory destructor causes unexpected exception at shutdown.

Is this still a train blocker?

Jul 19 2019, 6:11 AM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Wikimedia-Rdbms, Performance-Team, Wikimedia-production-error

Jul 18 2019

aaron added a comment to T51195: Drop filejournal table from WMF.

Dropping the field doesn't make sense, but dropping the whole table does. We do not use that class in production (and it is optional within MW core).

Jul 18 2019, 9:51 PM · MW-1.34-notes (1.34.0-wmf.20; 2019-08-27), DBA, Performance-Team (Radar), MediaWiki-File-management
aaron claimed T228465: TorBlock maintenance failures on labweb hosts.
Jul 18 2019, 9:41 PM · MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Patch-For-Review, MediaWiki-extensions-TorBlock
aaron added a parent task for T228465: TorBlock maintenance failures on labweb hosts: T220739: 1.34.0-wmf.14 deployment blockers.
Jul 18 2019, 8:39 PM · MW-1.34-notes (1.34.0-wmf.14; 2019-07-16), Patch-For-Review, MediaWiki-extensions-TorBlock
aaron added a subtask for T220739: 1.34.0-wmf.14 deployment blockers: T228465: TorBlock maintenance failures on labweb hosts.
Jul 18 2019, 8:39 PM · Release-Engineering-Team-TODO (201907), Release-Engineering-Team (Deployment services), Release, Train Deployments
aaron closed T228303: Redis exception connecting to "/var/run/nutcracker/redis_eqiad.sock": read error on connection, a subtask of T220739: 1.34.0-wmf.14 deployment blockers, as Resolved.
Jul 18 2019, 8:37 PM · Release-Engineering-Team-TODO (201907), Release-Engineering-Team (Deployment services), Release, Train Deployments
aaron closed T228303: Redis exception connecting to "/var/run/nutcracker/redis_eqiad.sock": read error on connection as Resolved.
Jul 18 2019, 8:36 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), serviceops, Operations, Performance-Team, MediaWiki-Cache, Wikimedia-production-error
aaron added a parent task for T225103: LBFactory destructor causes unexpected exception at shutdown: T220739: 1.34.0-wmf.14 deployment blockers.
Jul 18 2019, 8:36 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Wikimedia-Rdbms, Performance-Team, Wikimedia-production-error
aaron added a parent task for T228303: Redis exception connecting to "/var/run/nutcracker/redis_eqiad.sock": read error on connection: T220739: 1.34.0-wmf.14 deployment blockers.
Jul 18 2019, 8:36 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), serviceops, Operations, Performance-Team, MediaWiki-Cache, Wikimedia-production-error
aaron added subtasks for T220739: 1.34.0-wmf.14 deployment blockers: T225103: LBFactory destructor causes unexpected exception at shutdown, T228303: Redis exception connecting to "/var/run/nutcracker/redis_eqiad.sock": read error on connection.
Jul 18 2019, 8:36 PM · Release-Engineering-Team-TODO (201907), Release-Engineering-Team (Deployment services), Release, Train Deployments
aaron added a comment to T228436: web request timeout after 200 seconds due to Wikimedia\Rdbms\LBFactory->__destruct() > Wikimedia\Rdbms\LBFactory->commitMasterChanges().

The redis bug is at T228303

Jul 18 2019, 8:16 PM · Performance-Team, Wikimedia-Rdbms, Wikimedia-production-error
aaron merged T228436: web request timeout after 200 seconds due to Wikimedia\Rdbms\LBFactory->__destruct() > Wikimedia\Rdbms\LBFactory->commitMasterChanges() into T225103: LBFactory destructor causes unexpected exception at shutdown.
Jul 18 2019, 8:15 PM · MW-1.34-notes (1.34.0-wmf.15; 2019-07-23), Wikimedia-Rdbms, Performance-Team, Wikimedia-production-error
aaron merged task T228436: web request timeout after 200 seconds due to Wikimedia\Rdbms\LBFactory->__destruct() > Wikimedia\Rdbms\LBFactory->commitMasterChanges() into T225103: LBFactory destructor causes unexpected exception at shutdown.
Jul 18 2019, 8:15 PM · Performance-Team, Wikimedia-Rdbms, Wikimedia-production-error
aaron added a comment to T228436: web request timeout after 200 seconds due to Wikimedia\Rdbms\LBFactory->__destruct() > Wikimedia\Rdbms\LBFactory->commitMasterChanges().

The timeouts correspond with the redis problems:

Jul 18 2019, 8:14 PM · Performance-Team, Wikimedia-Rdbms, Wikimedia-production-error
aaron added a comment to T228436: web request timeout after 200 seconds due to Wikimedia\Rdbms\LBFactory->__destruct() > Wikimedia\Rdbms\LBFactory->commitMasterChanges().

The timeout aspect seems strange. The huge "idle" time increase at https://grafana.wikimedia.org/d/000000273/mysql sounds like the PageEditStash::parseAndCache() has an infinite timeout instead of 0 seconds (bug, it should be 0 as in non-blocking) and the parsing may have been slowed down for some reason, making more threads wait on the lock. Maybe the concurrent nutcracker issues were also affecting mcrouter (since the same hosts are used). Could also be something adding memcached write load: https://grafana.wikimedia.org/d/000000316/memcache?orgId=1&from=1563458818482&to=1563464680644 looks a little unusual, though not unlike the result of key version changes that happen from release to release (including the slow return to normal set() rate).

Jul 18 2019, 8:09 PM · Performance-Team, Wikimedia-Rdbms, Wikimedia-production-error
aaron created T228468: Move stats updates from AuthManager::autoCreateUser() HTTP GET to the job queue.
Jul 18 2019, 7:46 PM · Availability (MediaWiki-MultiDC), Performance-Team
aaron added a comment to T225642: Allow async foreign set/delete WAN cache operations in mcrouter.

OK, replication for SET/DELETE seems fine on mw1261/mw2224 for me and the STORED/NOT_STORED and FOUND/NOT_FOUND replies are what I expect when using (no prefix, /otherdc/mw-wan, and /thisdc/mwwan).

Jul 18 2019, 7:07 PM · User-Elukey, Patch-For-Review, Performance-Team (Radar), Availability (MediaWiki-MultiDC), Operations
aaron added a comment to T225642: Allow async foreign set/delete WAN cache operations in mcrouter.

Err, more PEBCAK . I put the * in the wrong spot...

Jul 18 2019, 7:01 PM · User-Elukey, Patch-For-Review, Performance-Team (Radar), Availability (MediaWiki-MultiDC), Operations
aaron added a comment to T225642: Allow async foreign set/delete WAN cache operations in mcrouter.

So, I've noticed that on mw1261/mw2224 as *well* as plain old mwmaint1002,mwmaint2001, that broadcasting keys doesn't seem to work, e.g.:

Jul 18 2019, 6:49 PM · User-Elukey, Patch-For-Review, Performance-Team (Radar), Availability (MediaWiki-MultiDC), Operations