aaron (Aaron Schulz)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Oct 20 2014, 5:25 PM (199 w, 1 d)
Availability
Available
IRC Nick
AaronSchulz
LDAP User
Aaron Schulz
MediaWiki User
Aaron Schulz [ Global Accounts ]

Recent Activity

Yesterday

aaron closed T118893: Consider using APC for the individually cached keys (e.g. 'TOO BIG') in MessageCache as Resolved.
Tue, Aug 14, 6:16 PM · MW-1.32-release-notes (WMF-deploy-2018-07-31 (1.32.0-wmf.15)), Patch-For-Review, Performance-Team, MediaWiki-Cache

Mon, Aug 13

aaron added a comment to T185724: Publish Doxygen for RunningStat library.

Where are the jenkins jobs defined?

Mon, Aug 13, 8:23 PM · Librarization, Performance-Team, RunningStat, Continuous-Integration-Config
aaron claimed T200471: [regression] LBFactorySimple breaks ExternalStorage, trying to connect to external server with local database name.
Mon, Aug 13, 8:16 PM · Patch-For-Review, MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), MW-1.31-release-notes, Performance-Team, Regression, MW-1.31-release, MediaWiki-Database
aaron updated the task description for T198239: Rollout use of mcrouter for MediaWiki in production.
Mon, Aug 13, 7:59 PM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Patch-For-Review, Availability (MediaWiki-MultiDC), Performance-Team

Fri, Aug 10

aaron added a comment to T164860: Update Echo's caching strategy for multi-dc compatibility.

Can this task be closed?

Fri, Aug 10, 8:02 PM · MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Performance-Team (Radar), Growth-Team (Current Sprint), Availability (MediaWiki-MultiDC), Collaboration-Team-Triage, Notifications

Wed, Aug 8

aaron added a comment to T201482: LinksUpdate fails, spams exception logs, whenever replication lag on any server rises above 10s.

Something like that approach seems worth trying.

Wed, Aug 8, 9:11 PM · Performance-Team (Radar), Core-Platform-Team, MediaWiki-Database
aaron added a comment to T154719: PageTriage opens master connection on GET for ArticleMetadata cache misses.

Is it possible to just update pagetriage_page_tags on page saves (and other relavent POST requests) when there are already master connections? For anything that depends on things updated via the job queue (like backlinks), those would have to be attached such LinksUpdates (which already run in POST/jobs). Why do things have to be updated on page views?

Wed, Aug 8, 6:38 PM · Performance-Team (Radar), Patch-For-Review, Growth-Team (Current Sprint), Collaboration-Team-Triage (Collab-Team-This-Quarter), Availability, MediaWiki-extensions-PageCuration
aaron closed T196608: Notice: Undefined index: ChronologyProtection in /srv/mediawiki/core/includes/libs/rdbms/lbfactory/LBFactory.php on line 504 in web upgrader as Resolved.
Wed, Aug 8, 5:47 PM · MW-1.32-release-notes (WMF-deploy-2018-08-07 (1.32.0-wmf.16)), Patch-For-Review, Performance-Team, MediaWiki-Installer, MediaWiki-Database

Sat, Aug 4

aaron added a comment to T201016: Include ADD operation in memcached stats and grafana dashboard.

I noticed that regular memcached counts ADD as it does SET (cmd_set). There is no cmd_add. However, mcrouter does seem to expose a cmd_add counter. Perhaps there can be a mcrouter dashboard similar to the Memcache on in Grafana?

Sat, Aug 4, 12:10 AM · Graphite, Operations

Fri, Aug 3

aaron removed a subtask for T88445: MediaWiki active/active datacenter investigation and work (tracking): T164504: Tracking: Cleanup x1 database connection patterns.
Fri, Aug 3, 6:07 PM · Availability (MediaWiki-MultiDC), Performance-Team, Epic
aaron removed a parent task for T164504: Tracking: Cleanup x1 database connection patterns: T88445: MediaWiki active/active datacenter investigation and work (tracking).
Fri, Aug 3, 6:07 PM · DBA
aaron removed a project from T164504: Tracking: Cleanup x1 database connection patterns: Availability (MediaWiki-MultiDC).
Fri, Aug 3, 6:05 PM · DBA
aaron added a comment to T164504: Tracking: Cleanup x1 database connection patterns.

Are there any tasks here that remain and are blockers to multi-DC?

Fri, Aug 3, 12:10 AM · DBA

Thu, Aug 2

aaron created T201016: Include ADD operation in memcached stats and grafana dashboard.
Thu, Aug 2, 3:32 PM · Graphite, Operations

Wed, Aug 1

aaron updated the task description for T198239: Rollout use of mcrouter for MediaWiki in production.
Wed, Aug 1, 5:24 PM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Patch-For-Review, Availability (MediaWiki-MultiDC), Performance-Team

Mon, Jul 30

aaron added a comment to T196608: Notice: Undefined index: ChronologyProtection in /srv/mediawiki/core/includes/libs/rdbms/lbfactory/LBFactory.php on line 504 in web upgrader.

Regression from fb51330084b4bde1880c76589e55e7cd87ed0c6d I assume

Mon, Jul 30, 11:59 PM · MW-1.32-release-notes (WMF-deploy-2018-08-07 (1.32.0-wmf.16)), Patch-For-Review, Performance-Team, MediaWiki-Installer, MediaWiki-Database
aaron moved T198239: Rollout use of mcrouter for MediaWiki in production from Next-up to Doing on the Performance-Team board.
Mon, Jul 30, 8:19 PM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Patch-For-Review, Availability (MediaWiki-MultiDC), Performance-Team
aaron moved T196608: Notice: Undefined index: ChronologyProtection in /srv/mediawiki/core/includes/libs/rdbms/lbfactory/LBFactory.php on line 504 in web upgrader from Next-up to Doing on the Performance-Team board.
Mon, Jul 30, 8:19 PM · MW-1.32-release-notes (WMF-deploy-2018-08-07 (1.32.0-wmf.16)), Patch-For-Review, Performance-Team, MediaWiki-Installer, MediaWiki-Database
aaron triaged T189702: Replace transcache table with objectcache backend as Low priority.
Mon, Jul 30, 8:12 PM · Core-Platform-Team, Performance-Team, MediaWiki-Templates, Patch-For-Review
aaron moved T189702: Replace transcache table with objectcache backend from Inbox to Blocked on the Performance-Team board.
Mon, Jul 30, 8:12 PM · Core-Platform-Team, Performance-Team, MediaWiki-Templates, Patch-For-Review
aaron moved T196608: Notice: Undefined index: ChronologyProtection in /srv/mediawiki/core/includes/libs/rdbms/lbfactory/LBFactory.php on line 504 in web upgrader from Inbox to Next-up on the Performance-Team board.
Mon, Jul 30, 8:11 PM · MW-1.32-release-notes (WMF-deploy-2018-08-07 (1.32.0-wmf.16)), Patch-For-Review, Performance-Team, MediaWiki-Installer, MediaWiki-Database
aaron claimed T196608: Notice: Undefined index: ChronologyProtection in /srv/mediawiki/core/includes/libs/rdbms/lbfactory/LBFactory.php on line 504 in web upgrader.
Mon, Jul 30, 8:11 PM · MW-1.32-release-notes (WMF-deploy-2018-08-07 (1.32.0-wmf.16)), Patch-For-Review, Performance-Team, MediaWiki-Installer, MediaWiki-Database
aaron moved T200506: Previewing a non-style-only gadget that you already have enabled causes a syntax error from Inbox to Next-up on the Performance-Team board.
Mon, Jul 30, 8:10 PM · Patch-For-Review, Performance-Team, MediaWiki-ResourceLoader
aaron assigned T200506: Previewing a non-style-only gadget that you already have enabled causes a syntax error to Krinkle.
Mon, Jul 30, 8:10 PM · Patch-For-Review, Performance-Team, MediaWiki-ResourceLoader
aaron triaged T200629: Using fully-qualified function calls is faster as Low priority.
Mon, Jul 30, 8:09 PM · MediaWiki-Codesniffer, Performance-Team, Performance, MediaWiki-General-or-Unknown
TK-999 awarded T189702: Replace transcache table with objectcache backend a Love token.
Mon, Jul 30, 12:47 PM · Core-Platform-Team, Performance-Team, MediaWiki-Templates, Patch-For-Review
aaron closed T199762: WikiPage::updateCategoryCounts causing Lock wait timeout exceeded as Resolved.
Mon, Jul 30, 3:50 AM · Wikimedia-log-errors, Performance-Team, Core-Platform-Team, MediaWiki-Database
aaron closed T199762: WikiPage::updateCategoryCounts causing Lock wait timeout exceeded, a subtask of T30499: 1205: Lock wait timeout exceeded; try restarting transaction (tracking), as Resolved.
Mon, Jul 30, 3:50 AM · Technical-Debt, Tracking, MediaWiki-Database

Fri, Jul 27

aaron added a comment to T200471: [regression] LBFactorySimple breaks ExternalStorage, trying to connect to external server with local database name.

This may be caused by rMW14ee3f210782 self-merged by @aaron

Fri, Jul 27, 1:16 AM · Patch-For-Review, MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), MW-1.31-release-notes, Performance-Team, Regression, MW-1.31-release, MediaWiki-Database

Thu, Jul 26

aaron added a comment to T199762: WikiPage::updateCategoryCounts causing Lock wait timeout exceeded.

From https://logstash.wikimedia.org/goto/0b9191830a12ab3d15bce062cdb36a93, this seemed to be better. But we should wait longer.

Thu, Jul 26, 10:48 PM · Wikimedia-log-errors, Performance-Team, Core-Platform-Team, MediaWiki-Database
aaron added a comment to T200468: Percona XtraDB Cluster gives error when using GET_LOCK() when pxc_strict_mode=ENFORCING is set (e.g. By ApiStashEdit.php).

From a glance, it looks like xtradb cluster is build on Galera (which is something itself to consider in the future). Use of GET_LOCK is tricky there since it would have to use wsrep or have such queries directed to dedicated master (perhaps with some HA in front that doesn't split brain).

Thu, Jul 26, 10:19 PM · MediaWiki-Database
aaron added a comment to T200420: Wikidata dispatching stuck (not releasing lockmanager locks).

Ah, right, I read that ternary backwards, <<$maxTime < PHP_INT_MAX ? PHP_INT_MAX : 1>>.

Thu, Jul 26, 5:34 PM · MW-1.32-release-notes (WMF-deploy-2018-07-31 (1.32.0-wmf.15)), Patch-For-Review, User-Addshore, Wikidata-Campsite, Wikidata
aaron added a comment to T200420: Wikidata dispatching stuck (not releasing lockmanager locks).

Something to note, because the locks are no longer in the DB, we end up selecting the same 15 or so wikis that are locked all of the time.
It could be that the other wikis actually don't have locks:

before using the redis lock manager the status of the lock from the db was also in the select so that locked dbs would not be selected at all.

Thu, Jul 26, 5:05 PM · MW-1.32-release-notes (WMF-deploy-2018-07-31 (1.32.0-wmf.15)), Patch-For-Review, User-Addshore, Wikidata-Campsite, Wikidata

Thu, Jul 19

aaron created T200026: RepoGroup exceptions due to "false" being passed as a key to MapCacheLRU.
Thu, Jul 19, 4:20 PM · MW-1.32-release-notes (WMF-deploy-2018-07-24 (1.32.0-wmf.14)), Performance-Team, Patch-For-Review, Release-Engineering-Team (Kanban), Release, Train Deployments
aaron added a comment to T199594: Exception "Job queue is read-only".

Normally, it would be odd to let jobs pile up but not execute them, though the multi-DC use case of $wgReadOnly in one of the DCs wasn't considered in T130795. Ideally, jobs enqueued on GET/HEAD wouldn't be a thing...but that's not going away anytime soon.

Thu, Jul 19, 12:52 PM · Services (done), MW-1.32-release-notes (WMF-deploy-2018-07-24 (1.32.0-wmf.14)), User-Joe, Operations, Wikimedia-log-errors, Core-Platform-Team, WMF-JobQueue

Tue, Jul 17

aaron added a comment to T199762: WikiPage::updateCategoryCounts causing Lock wait timeout exceeded.

My first inclination is to try to reduce the refreshCounts() calls.

Tue, Jul 17, 8:22 PM · Wikimedia-log-errors, Performance-Team, Core-Platform-Team, MediaWiki-Database
aaron updated the task description for T198239: Rollout use of mcrouter for MediaWiki in production.
Tue, Jul 17, 3:30 PM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Patch-For-Review, Availability (MediaWiki-MultiDC), Performance-Team

Mon, Jul 16

aaron removed a project from T92357: Fix problematic database master queries performed on HTTP GET/HEAD: MW-1.30-release-notes (WMF-deploy-2017-06-13_(1.30.0-wmf.5)).
Mon, Jul 16, 9:12 PM · Availability (MediaWiki-MultiDC), Patch-For-Review, MediaWiki-General-or-Unknown
aaron placed T95501: Fix causes of slave lag and get it to under 5 seconds at peak up for grabs.
Mon, Jul 16, 9:12 PM · Goal, Performance-Team, Availability
aaron placed T190260: Fatal exception of type "Wikimedia\Rdbms\DBTransactionSizeError" trying to undelete a file up for grabs.
Mon, Jul 16, 8:37 PM · MW-1.32-release-notes (WMF-deploy-2018-07-24 (1.32.0-wmf.14)), Multimedia, Core-Platform-Team, Patch-For-Review, MediaWiki-Page-deletion, MediaWiki-File-management, Performance-Team

Jul 11 2018

aaron closed T194403: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index. as Resolved.
Jul 11 2018, 12:43 PM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Release-Engineering-Team (Watching / External), Performance-Team, MediaWiki-Database, Wikimedia-log-errors
aaron closed T199218: MemcachedBagOStuff.php: Key contains invalid characters as Resolved.

This was fixed by the 61a7e1acd0af4a5386df03335733accfde179fa1 backport.

Jul 11 2018, 10:07 AM · MediaWiki-General-or-Unknown
aaron closed T199039: "Fatal exception of type "Exception"" when using Special:LanguageStats on MediaWiki.org as Resolved.

Fixed with the 61a7e1acd0af4a5386df03335733accfde179fa1 backport.

Jul 11 2018, 10:06 AM · Wikimedia-log-errors, Patch-For-Review, MediaWiki-extensions-Translate
aaron updated subscribers of T194403: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index..

Change 445110 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] rdbms: fix value of ChronologyProtector::POSITION_COOKIE_TTL

https://gerrit.wikimedia.org/r/445110

Jul 11 2018, 9:39 AM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Release-Engineering-Team (Watching / External), Performance-Team, MediaWiki-Database, Wikimedia-log-errors
aaron added a comment to T194403: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index..

Given how low server_failure_limit is, it might help to lower server_retry_timeout from 30s to something < 5s. Consistent hash ejections seem like the most obvious thing that could cause an acknowledged write to be seen as not being there for any of the next 5 seconds.

Jul 11 2018, 9:07 AM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Release-Engineering-Team (Watching / External), Performance-Team, MediaWiki-Database, Wikimedia-log-errors

Jul 10 2018

aaron updated the task description for T198239: Rollout use of mcrouter for MediaWiki in production.
Jul 10 2018, 6:37 PM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Patch-For-Review, Availability (MediaWiki-MultiDC), Performance-Team
aaron closed T199216: SpamBlacklist / MemcachedBagOStuff.php: Key contains invalid characters error prevents pages from being saved on many Wikipedias as Resolved.

The configuration change was reverted. It will be fine to re-apply it once 4ad6b70ba132c66e14a706eae240887885946a42 is merged (I thought that 13 day old change landed already).

Jul 10 2018, 12:42 PM · SpamBlacklist, Patch-For-Review, MediaWiki-Cache, Wikimedia-log-errors
aaron added a comment to T199216: SpamBlacklist / MemcachedBagOStuff.php: Key contains invalid characters error prevents pages from being saved on many Wikipedias.
Key contains invalid characters: zhwiki:blacklist:spam:pass:593a98fd45a6bef5b16ab648bf2eea6368883524:人類疱疹病毒第四型
#0 /srv/mediawiki/php-1.32.0-wmf.10/includes/libs/objectcache/MemcachedPeclBagOStuff.php(154): MemcachedBagOStuff->validateKeyEncoding(string)
#1 /srv/mediawiki/php-1.32.0-wmf.10/includes/libs/objectcache/MemcachedBagOStuff.php(56): MemcachedPeclBagOStuff->getWithToken(string, NULL, integer)
#2 /srv/mediawiki/php-1.32.0-wmf.10/includes/libs/objectcache/BagOStuff.php(197): MemcachedBagOStuff->doGet(string, integer)
#3 /srv/mediawiki/php-1.32.0-wmf.10/includes/libs/objectcache/ReplicatedBagOStuff.php(80): BagOStuff->get(string, integer)
#4 /srv/mediawiki/php-1.32.0-wmf.10/includes/libs/objectcache/BagOStuff.php(197): ReplicatedBagOStuff->doGet(string, integer)
#5 /srv/mediawiki/php-1.32.0-wmf.10/extensions/SpamBlacklist/includes/SpamBlacklist.php(78): BagOStuff->get(string)
#6 /srv/mediawiki/php-1.32.0-wmf.10/extensions/SpamBlacklist/includes/SpamBlacklist.php(293): SpamBlacklist->filter(array, Title, boolean, string)
#7 /srv/mediawiki/php-1.32.0-wmf.10/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php(67): SpamBlacklist->warmCachesForFilter(Title, array)
#8 /srv/mediawiki/php-1.32.0-wmf.10/includes/Hooks.php(174): SpamBlacklistHooks::onParserOutputStashForEdit(WikiPage, WikitextContent, ParserOutput, string, User)
#9 /srv/mediawiki/php-1.32.0-wmf.10/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)
#10 /srv/mediawiki/php-1.32.0-wmf.10/includes/api/ApiStashEdit.php(215): Hooks::run(string, array)
#11 /srv/mediawiki/php-1.32.0-wmf.10/includes/api/ApiStashEdit.php(151): ApiStashEdit::parseAndStash(WikiPage, WikitextContent, User, string)
#12 /srv/mediawiki/php-1.32.0-wmf.10/includes/api/ApiMain.php(1584): ApiStashEdit->execute()
#13 /srv/mediawiki/php-1.32.0-wmf.10/includes/api/ApiMain.php(535): ApiMain->executeAction()
#14 /srv/mediawiki/php-1.32.0-wmf.10/includes/api/ApiMain.php(506): ApiMain->executeActionWithErrorHandling()
#15 /srv/mediawiki/php-1.32.0-wmf.10/api.php(83): ApiMain->execute()
#16 /srv/mediawiki/w/api.php(3): include(string)
#17 {main}
Jul 10 2018, 11:53 AM · SpamBlacklist, Patch-For-Review, MediaWiki-Cache, Wikimedia-log-errors
aaron closed T198483: Save Timing increased 50% since 2018-06-28 20:53 as Resolved.
Jul 10 2018, 9:17 AM · MW-1.32-release-notes (WMF-deploy-2018-06-26 (1.32.0-wmf.10)), Patch-For-Review, Performance-Team, Release-Engineering-Team
aaron closed T198483: Save Timing increased 50% since 2018-06-28 20:53, a subtask of T191056: 1.32.0-wmf.10 deployment blockers, as Resolved.
Jul 10 2018, 9:17 AM · Patch-For-Review, Release-Engineering-Team (Kanban), Release, Train Deployments

Jul 9 2018

aaron created T199150: Audio player is non-functional on first load in Chrome.
Jul 9 2018, 8:27 PM · TimedMediaHandler
aaron added a comment to T198483: Save Timing increased 50% since 2018-06-28 20:53.

I wonder why this (or even the previous issue) does not show up on the edit stash dashboard? This should be significantly affecting stash hit ratios.

Jul 9 2018, 7:17 PM · MW-1.32-release-notes (WMF-deploy-2018-06-26 (1.32.0-wmf.10)), Patch-For-Review, Performance-Team, Release-Engineering-Team

Jul 6 2018

aaron added a comment to T198483: Save Timing increased 50% since 2018-06-28 20:53.

Judging from https://performance.wikimedia.org/xenon/svgs/daily/2018-06-26.api.svgz (pre-28th) and https://performance.wikimedia.org/xenon/svgs/daily/2018-07-01.api.svgz , perhaps the prepared parser output is not getting reused between SpamBlacklist and doEditContent() as much as before.

Jul 6 2018, 7:15 PM · MW-1.32-release-notes (WMF-deploy-2018-06-26 (1.32.0-wmf.10)), Patch-For-Review, Performance-Team, Release-Engineering-Team

Jul 5 2018

aaron updated the task description for T198239: Rollout use of mcrouter for MediaWiki in production.
Jul 5 2018, 2:18 PM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Patch-For-Review, Availability (MediaWiki-MultiDC), Performance-Team
aaron added a comment to T198239: Rollout use of mcrouter for MediaWiki in production.

Change 440469 merged by jenkins-bot:
[operations/mediawiki-config@master] Make test wikis just write to both nutcracker and mcrouter

https://gerrit.wikimedia.org/r/440469

Jul 5 2018, 2:01 PM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Patch-For-Review, Availability (MediaWiki-MultiDC), Performance-Team
aaron added a comment to T93097: MediaWiki should support SQLite 3.8 (or later).

LCStoreDB should use a separate DB file, as does objectcache (by default, via the installer), if it is to be used at all. Better yet, is that $wgCacheDirectory can be set so that LCStoreCDB can be used.

Jul 5 2018, 11:14 AM · MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), Core-Platform-Team, Patch-For-Review, SQLite, MediaWiki-Database

Jul 4 2018

aaron added a comment to T198156: Server-side deletion of User:LorenzoMilano/sandbox.

@Krinkle Do you think some eval.php magic could do it or even if we do that via that script transaction limits would still apply? Regards.

The method some people use via eval.php would use the same methods as action=delete or deleteBatch.php would, and is also subject to transaction limits. While one could manually make changes in the database, these limits exist for a reason.

Jul 4 2018, 4:30 PM · Wikimedia-Site-requests
aaron added a comment to T198239: Rollout use of mcrouter for MediaWiki in production.

At this point, I want to just go ahead and deploy https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/440469/ .

Jul 4 2018, 7:46 AM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Patch-For-Review, Availability (MediaWiki-MultiDC), Performance-Team

Jul 3 2018

aaron placed T161749: Introduce InterruptMutexManager up for grabs.
Jul 3 2018, 3:44 PM · Core-Platform-Team, Patch-For-Review, TechCom-RFC (TechCom-Approved), User-Daniel, Performance-Team, MediaWiki-General-or-Unknown
aaron placed T197849: Add Grafana dashboard for WANObjectCache statsd up for grabs.
Jul 3 2018, 3:35 PM · Goal, Performance-Team
aaron added a comment to T198239: Rollout use of mcrouter for MediaWiki in production.

+1 to the overall plan; I'd like to see dates attached to the various steps now, so that we can have a clear schedule.

Jul 3 2018, 3:13 PM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Patch-For-Review, Availability (MediaWiki-MultiDC), Performance-Team
aaron added a comment to T198483: Save Timing increased 50% since 2018-06-28 20:53.

Judging from https://performance.wikimedia.org/xenon/svgs/daily/2018-06-26.api.svgz (pre-28th) and https://performance.wikimedia.org/xenon/svgs/daily/2018-07-01.api.svgz , perhaps the prepared parser output is not getting reused between SpamBlacklist and doEditContent() as much as before.

Jul 3 2018, 10:57 AM · MW-1.32-release-notes (WMF-deploy-2018-06-26 (1.32.0-wmf.10)), Patch-For-Review, Performance-Team, Release-Engineering-Team

Jul 2 2018

aaron added a comment to T197450: test.wp is using test2.wp's message cache.

Testing around, I see that https://test2.wikipedia.org/wiki/MediaWiki_talk:Msg-override doesn't use the {{msg-override}} message from test.wkipedia.org. Is this task still named appropriately?

Jul 2 2018, 8:46 PM · Performance-Team, Gadgets, MediaWiki-Cache, Operations
aaron added a comment to T190260: Fatal exception of type "Wikimedia\Rdbms\DBTransactionSizeError" trying to undelete a file.

Still seeing ~ 200 hits per day in Logstash for (DBTransactionSizeError) just within wiki:commonswiki AND method:POST.

Jul 2 2018, 11:34 AM · MW-1.32-release-notes (WMF-deploy-2018-07-24 (1.32.0-wmf.14)), Multimedia, Core-Platform-Team, Patch-For-Review, MediaWiki-Page-deletion, MediaWiki-File-management, Performance-Team
aaron updated subscribers of T190260: Fatal exception of type "Wikimedia\Rdbms\DBTransactionSizeError" trying to undelete a file.
Jul 2 2018, 11:28 AM · MW-1.32-release-notes (WMF-deploy-2018-07-24 (1.32.0-wmf.14)), Multimedia, Core-Platform-Team, Patch-For-Review, MediaWiki-Page-deletion, MediaWiki-File-management, Performance-Team
aaron closed T196125: php-memcached 3.0 (PHP 7) incompatible with BagOStuff as Resolved.
Jul 2 2018, 11:10 AM · MW-1.30-release-notes, MW-1.31-release-notes, MW-1.29-release-notes, MW-1.27-release-notes, MW-1.31-release, MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), Performance-Team, PHP 7.0 support, MediaWiki-Platform-Team, Operations
aaron closed T196125: php-memcached 3.0 (PHP 7) incompatible with BagOStuff, a subtask of T176370: Migrate to PHP 7 in WMF production, as Resolved.
Jul 2 2018, 11:10 AM · Core-Platform-Team, TechCom-RFC (TechCom-Approved), User-ArielGlenn, HHVM, Operations
aaron placed T188801: Migrate wl_notificationtimestamp updates to the job queue up for grabs.
Jul 2 2018, 11:09 AM · MediaWiki-Watchlist, Patch-For-Review, Availability (MediaWiki-MultiDC)
aaron added a comment to T198280: Beta Cluster: Unable to obtain lock via objectcache (memcached add() fails).

As of now, I get (as expected):

Jul 2 2018, 10:41 AM · MW-1.32-release-notes (WMF-deploy-2018-06-26 (1.32.0-wmf.10)), Patch-For-Review, Regression, Analytics-EventLogging, Performance-Team, Analytics

Jun 28 2018

aaron added a comment to T198350: Rising lock wait timeout SQL errors upon 1.32.0-wmf.10 group1 deployment.

The DBPerformance log shows a spike during the time wmf-10 was deployed on group1: https://logstash.wikimedia.org/goto/7c86a7d63a305c220a37a3a49844ef2c. The vast majority of entries are for commonswiki. Here are a few examples:

Sub-optimal transaction on DB(s) [10.64.48.23 (commonswiki) (TRX#26770d)]: 
0	0.000436	query-m: INSERT IGNORE INTO `page` (page_namespace,page_title,page_restrictions,page_is_redirect,page_is_new,page_random,page_touched,page_latest,page_len) VALUES ('X') [TRX#26770d]
1	0.001476	query-m: INSERT INTO `blobs_cluster2N` (blob_text) VALUES ('X')
2	0.000313	query-m: INSERT INTO `text` (old_id,old_text,old_flags) VALUES (NULL,'X') [TRX#26770d]
3	0.000497	query-m: INSERT INTO `comment` (comment_hash,comment_text,comment_data) VALUES ('X',NULL) [TRX#26770d]
4	0.000425	query-m: INSERT INTO `revision` (rev_page,rev_parent_id,rev_text_id,rev_minor_edit,rev_timestamp,rev_deleted,rev_len,rev_sha1,rev_comment,rev_user,rev_user_text,rev_content_model,rev_content_format) VALUES ('X',NULL,NULL) [TRX#26770d]
5	15.847574	query-m: INSERT INTO `revision_comment_temp` (revcomment_rev,revcomment_comment_id) VALUES ('X') [TRX#26770d]

This is from https://commons.wikimedia.org/w/index.php?title=File:Portrait_of_Maria_van_Rijswijk_Dutch_School_Rijksdienst_voor_het_Cultureel_Erfgoed_B670.jpg&action=edit

I'm surprised to see blobs_cluster2N in there - ExternalStore's blob tables are generally on a different DB server, no? I'll check whether this may be the cause of the problem somehow.

Jun 28 2018, 5:43 PM · MW-1.32-release-notes (WMF-deploy-2018-06-26 (1.32.0-wmf.10)), Multi-Content-Revisions (MCR-SDC Storage Layer - phase 1), DBA, MediaWiki-Database, Wikimedia-log-errors
aaron added a comment to T198280: Beta Cluster: Unable to obtain lock via objectcache (memcached add() fails).

I get:

Jun 28 2018, 8:32 AM · MW-1.32-release-notes (WMF-deploy-2018-06-26 (1.32.0-wmf.10)), Patch-For-Review, Regression, Analytics-EventLogging, Performance-Team, Analytics

Jun 27 2018

aaron added a comment to T198280: Beta Cluster: Unable to obtain lock via objectcache (memcached add() fails).

Maybe for certain keys, when the mc server used by nutcracker matches that of the mcrouter one, add() fails since it has to succeed on *all* backends for MultiWriteBagOStuff to return true and the second write would see the first one.

Jun 27 2018, 10:41 PM · MW-1.32-release-notes (WMF-deploy-2018-06-26 (1.32.0-wmf.10)), Patch-For-Review, Regression, Analytics-EventLogging, Performance-Team, Analytics
aaron added a comment to T198280: Beta Cluster: Unable to obtain lock via objectcache (memcached add() fails).

How is $cache defined?

Jun 27 2018, 10:33 PM · MW-1.32-release-notes (WMF-deploy-2018-06-26 (1.32.0-wmf.10)), Patch-For-Review, Regression, Analytics-EventLogging, Performance-Team, Analytics
aaron added a comment to T198280: Beta Cluster: Unable to obtain lock via objectcache (memcached add() fails).
Jun 27 2018, 6:12 PM · MW-1.32-release-notes (WMF-deploy-2018-06-26 (1.32.0-wmf.10)), Patch-For-Review, Regression, Analytics-EventLogging, Performance-Team, Analytics
aaron added a comment to T198280: Beta Cluster: Unable to obtain lock via objectcache (memcached add() fails).

I can't reproduce this in eval.php (after https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/442245/).

Jun 27 2018, 6:12 PM · MW-1.32-release-notes (WMF-deploy-2018-06-26 (1.32.0-wmf.10)), Patch-For-Review, Regression, Analytics-EventLogging, Performance-Team, Analytics

Jun 26 2018

Imarlier awarded T198239: Rollout use of mcrouter for MediaWiki in production a Love token.
Jun 26 2018, 7:46 PM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Patch-For-Review, Availability (MediaWiki-MultiDC), Performance-Team
aaron created T198239: Rollout use of mcrouter for MediaWiki in production.
Jun 26 2018, 6:38 PM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Patch-For-Review, Availability (MediaWiki-MultiDC), Performance-Team

Jun 25 2018

aaron added a comment to T118893: Consider using APC for the individually cached keys (e.g. 'TOO BIG') in MessageCache.

Looking at https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/441601/

I assume the reason they're not fine for APC is not because of access control (e.g. vary by user or session), but rather that the content pre-filled there might change without a purge strategy?

Jun 25 2018, 6:01 AM · MW-1.32-release-notes (WMF-deploy-2018-07-31 (1.32.0-wmf.15)), Patch-For-Review, Performance-Team, MediaWiki-Cache

Jun 23 2018

aaron added a comment to T118893: Consider using APC for the individually cached keys (e.g. 'TOO BIG') in MessageCache.

For the case of message with no page definitions but rather hook-based definitions:
Looking at MWOAuthUIHooks::onMessagesPreLoad, I see that the use of MWOAuthDAOAccessControl would be problematic for APC use, though a process cache could still be used.

Jun 23 2018, 4:07 PM · MW-1.32-release-notes (WMF-deploy-2018-07-31 (1.32.0-wmf.15)), Patch-For-Review, Performance-Team, MediaWiki-Cache
aaron renamed T118893: Consider using APC for the individually cached keys (e.g. 'TOO BIG') in MessageCache from Consider using APC for 'TOO BIG'/'NONEXISTENT' keys and for non-existing messages that have no i18n default in MessageCache to Consider using APC for the individually cached keys (e.g. 'TOO BIG') in MessageCache.
Jun 23 2018, 3:53 PM · MW-1.32-release-notes (WMF-deploy-2018-07-31 (1.32.0-wmf.15)), Patch-For-Review, Performance-Team, MediaWiki-Cache
aaron renamed T118893: Consider using APC for the individually cached keys (e.g. 'TOO BIG') in MessageCache from Consider using APC for 'TOO BIG' keys and for non-existing messages that have no i18n default in MessageCache to Consider using APC for 'TOO BIG'/'NONEXISTENT' keys and for non-existing messages that have no i18n default in MessageCache.
Jun 23 2018, 3:39 PM · MW-1.32-release-notes (WMF-deploy-2018-07-31 (1.32.0-wmf.15)), Patch-For-Review, Performance-Team, MediaWiki-Cache
aaron renamed T118893: Consider using APC for the individually cached keys (e.g. 'TOO BIG') in MessageCache from Consider using APC for 'NONEXISTENT' and 'TOO BIG' as well in MessageCache to Consider using APC for 'TOO BIG' keys and for non-existing messages that have no i18n default in MessageCache.
Jun 23 2018, 3:12 PM · MW-1.32-release-notes (WMF-deploy-2018-07-31 (1.32.0-wmf.15)), Patch-For-Review, Performance-Team, MediaWiki-Cache
aaron added a comment to T102793: Bash tools with histograms, trends, and "field" tool should be available to all users on fluorine.

I believe that is the script, yes, though I can't seem to figure out how to it to format properly (the histogram bars are the same size).

Jun 23 2018, 9:57 AM · Performance-Team

Jun 15 2018

aaron closed T184525: Explicitly providing a database index to LoadBalancer::getConnection() should return the selected connection. as Resolved.

Fixed in daf0514345f03189187606ba2323794588c79dc9 .

Jun 15 2018, 6:44 PM · Performance-Team, MediaWiki-Database

Jun 14 2018

aaron closed T193668: Transaction should be in the callback stage (not 'cursory') as Resolved.
Jun 14 2018, 7:18 PM · Performance-Team, MW-1.32-release-notes (WMF-deploy-2018-05-08 (1.32.0-wmf.3)), MediaWiki-Database, Wikimedia-log-errors
aaron closed T193668: Transaction should be in the callback stage (not 'cursory'), a subtask of T41480: Issues affecting translatewiki.net, as Resolved.
Jun 14 2018, 7:18 PM · Tracking, MediaWiki-General-or-Unknown
aaron added a comment to T197125: MediaWiki deadlock when multiple files of same SHA1 are deleted simultaneously.

For web requests, the lock timeout should be 5 min:

Jun 14 2018, 12:32 AM · Multimedia, Commons, MediaWiki-Page-deletion, MediaWiki-File-management

Jun 13 2018

aaron added a comment to T194403: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index..

That said, from mc1019, I see:

Jun 13 2018, 9:04 AM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Release-Engineering-Team (Watching / External), Performance-Team, MediaWiki-Database, Wikimedia-log-errors
jcrespo awarded T194403: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index. a Love token.
Jun 13 2018, 9:02 AM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Release-Engineering-Team (Watching / External), Performance-Team, MediaWiki-Database, Wikimedia-log-errors
aaron added a comment to T194403: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index..

Please excuse my ignorance, but you are talking redis for sessions, not for the jobqueue (which is, or is close to be, deprecated, right?).

Jun 13 2018, 8:53 AM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Release-Engineering-Team (Watching / External), Performance-Team, MediaWiki-Database, Wikimedia-log-errors
aaron added a comment to T190082: 5-second latency for certain API calls?.

I suggest that cookie set/receive round-tripping should be tested for encoding/truncation issues with @ or # for these apps, as well as letter case changes or such. The above patch simply discards cookie headers for cpPosIndex that are botched.

Jun 13 2018, 7:33 AM · MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), Wikipedia-Android-App-Backlog (Android-app-release-v2.7.24x-I-Ice-lolly), Patch-For-Review, MediaWiki-Database, Performance-Team, Android-app-Bugs
aaron added a comment to T194403: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index..

There are now only ~10/min of these now. I still see no 'redis' channel errors, but I wonder if the random eviction model of redis is at play. redis 3.0 is a bit better at LRU per https://redis.io/topics/lru-cache than our 2.8.

Jun 13 2018, 7:26 AM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Release-Engineering-Team (Watching / External), Performance-Team, MediaWiki-Database, Wikimedia-log-errors
aaron added a comment to T187951: Intermittent "Error loading data from server" error using VE on officewiki.

Does this still happen after https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/440031/ ?

Jun 13 2018, 1:41 AM · Performance-Team, VisualEditor (Current work)
Krinkle awarded T194403: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index. a Orange Medal token.
Jun 13 2018, 1:37 AM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Release-Engineering-Team (Watching / External), Performance-Team, MediaWiki-Database, Wikimedia-log-errors
aaron added a comment to T190082: 5-second latency for certain API calls?.

Seems to gotten better after https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/440031/ was deployed and likewise for logstash (+"ChronologyProtector::initPositions").

Jun 13 2018, 1:25 AM · MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), Wikipedia-Android-App-Backlog (Android-app-release-v2.7.24x-I-Ice-lolly), Patch-For-Review, MediaWiki-Database, Performance-Team, Android-app-Bugs

Jun 12 2018

aaron added a comment to T95799: wfWaitForSlaves in JobRunner can massively slow down run rate if just a single slave is lagged.

I don't understand how we can implement the task as described. It's intentional that write-heavy maintenance scripts go at the speed of the slowest slave. If you only wait for a majority then you could have 50% of slaves permanently lagged, potentially by days or weeks.

Jun 12 2018, 9:13 AM · Availability, Performance-Team (Radar), DBA, MediaWiki-Database

Jun 11 2018

Gerrit Code Review <gerrit@wikimedia.org> committed rELINT81eafaa9d804: Update patch set 7 (authored by aaron).
Update patch set 7
Jun 11 2018, 2:21 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rELINT61e5d3d2b984: Update patch set 7 (authored by aaron).
Update patch set 7
Jun 11 2018, 2:21 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rELINT34d38684c546: Update patch set 3 (authored by aaron).
Update patch set 3
Jun 11 2018, 2:21 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rELINT3426b0cd1035: Update patch set 3 (authored by aaron).
Update patch set 3
Jun 11 2018, 2:21 PM