aaron (Aaron Schulz)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Oct 20 2014, 5:25 PM (191 w, 4 d)
Availability
Available
IRC Nick
AaronSchulz
LDAP User
Aaron Schulz
MediaWiki User
Aaron Schulz

Recent Activity

Fri, Jun 15

aaron closed T184525: Explicitly providing a database index to LoadBalancer::getConnection() should return the selected connection. as Resolved.

Fixed in daf0514345f03189187606ba2323794588c79dc9 .

Fri, Jun 15, 6:44 PM · Performance-Team, MediaWiki-Database

Thu, Jun 14

aaron closed T193668: Transaction should be in the callback stage (not 'cursory') as Resolved.
Thu, Jun 14, 7:18 PM · Performance-Team, MW-1.32-release-notes (WMF-deploy-2018-05-08 (1.32.0-wmf.3)), MediaWiki-Database, Wikimedia-log-errors
aaron closed T193668: Transaction should be in the callback stage (not 'cursory'), a subtask of T41480: Issues affecting translatewiki.net, as Resolved.
Thu, Jun 14, 7:18 PM · Tracking, MediaWiki-General-or-Unknown
aaron added a comment to T197125: MediaWiki deadlock when multiple files of same SHA1 are deleted simultaneously.

For web requests, the lock timeout should be 5 min:

Thu, Jun 14, 12:32 AM · Multimedia, Commons, MediaWiki-Page-deletion, MediaWiki-File-management

Wed, Jun 13

aaron added a comment to T194403: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index..

That said, from mc1019, I see:

Wed, Jun 13, 9:04 AM · MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), Release-Engineering-Team (Watching / External), Patch-For-Review, Performance-Team, MediaWiki-Database, Wikimedia-log-errors
jcrespo awarded T194403: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index. a Love token.
Wed, Jun 13, 9:02 AM · MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), Release-Engineering-Team (Watching / External), Patch-For-Review, Performance-Team, MediaWiki-Database, Wikimedia-log-errors
aaron added a comment to T194403: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index..

Please excuse my ignorance, but you are talking redis for sessions, not for the jobqueue (which is, or is close to be, deprecated, right?).

Wed, Jun 13, 8:53 AM · MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), Release-Engineering-Team (Watching / External), Patch-For-Review, Performance-Team, MediaWiki-Database, Wikimedia-log-errors
aaron added a comment to T190082: 5-second latency for certain API calls?.

I suggest that cookie set/receive round-tripping should be tested for encoding/truncation issues with @ or # for these apps, as well as letter case changes or such. The above patch simply discards cookie headers for cpPosIndex that are botched.

Wed, Jun 13, 7:33 AM · MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), Wikipedia-Android-App-Backlog (Android-app-release-v2.7.24x-I-Ice-lolly), Patch-For-Review, MediaWiki-Database, Performance-Team, Android-app-Bugs
aaron added a comment to T194403: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index..

There are now only ~10/min of these now. I still see no 'redis' channel errors, but I wonder if the random eviction model of redis is at play. redis 3.0 is a bit better at LRU per https://redis.io/topics/lru-cache than our 2.8.

Wed, Jun 13, 7:26 AM · MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), Release-Engineering-Team (Watching / External), Patch-For-Review, Performance-Team, MediaWiki-Database, Wikimedia-log-errors
aaron added a comment to T187951: Intermittent "Error loading data from server" error using VE on officewiki.

Does this still happen after https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/440031/ ?

Wed, Jun 13, 1:41 AM · Performance-Team, VisualEditor (Current work)
Krinkle awarded T194403: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index. a Orange Medal token.
Wed, Jun 13, 1:37 AM · MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), Release-Engineering-Team (Watching / External), Patch-For-Review, Performance-Team, MediaWiki-Database, Wikimedia-log-errors
aaron added a comment to T190082: 5-second latency for certain API calls?.

Seems to gotten better after https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/440031/ was deployed and likewise for logstash (+"ChronologyProtector::initPositions").

Wed, Jun 13, 1:25 AM · MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), Wikipedia-Android-App-Backlog (Android-app-release-v2.7.24x-I-Ice-lolly), Patch-For-Review, MediaWiki-Database, Performance-Team, Android-app-Bugs

Tue, Jun 12

aaron added a comment to T95799: wfWaitForSlaves in JobRunner can massively slow down run rate if just a single slave is lagged.

I don't understand how we can implement the task as described. It's intentional that write-heavy maintenance scripts go at the speed of the slowest slave. If you only wait for a majority then you could have 50% of slaves permanently lagged, potentially by days or weeks.

Tue, Jun 12, 9:13 AM · MediaWiki-Platform-Team, Availability, Performance-Team (Radar), DBA, MediaWiki-Database

Mon, Jun 11

Gerrit Code Review <gerrit@wikimedia.org> committed rELINT81eafaa9d804: Update patch set 7 (authored by aaron).
Update patch set 7
Mon, Jun 11, 2:21 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rELINT61e5d3d2b984: Update patch set 7 (authored by aaron).
Update patch set 7
Mon, Jun 11, 2:21 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rELINT34d38684c546: Update patch set 3 (authored by aaron).
Update patch set 3
Mon, Jun 11, 2:21 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rELINT3426b0cd1035: Update patch set 3 (authored by aaron).
Update patch set 3
Mon, Jun 11, 2:21 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rELINT57230b87b036: Update patch set 2 (authored by aaron).
Update patch set 2
Mon, Jun 11, 2:21 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rELINT0df0d0eda58a: Update patch set 2 (authored by aaron).
Update patch set 2
Mon, Jun 11, 2:21 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rELINT5140efca41f9: Update patch set 2 (authored by aaron).
Update patch set 2
Mon, Jun 11, 2:15 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rELINT5877804b4f02: Update patch set 1 (authored by aaron).
Update patch set 1
Mon, Jun 11, 2:15 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rELINTae84598b8a8c: Update patch set 1 (authored by aaron).
Update patch set 1
Mon, Jun 11, 2:14 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1903:18310a09852d: Update patch set 1 (authored by aaron).
Update patch set 1
Mon, Jun 11, 10:12 AM
Gerrit Code Review <gerrit@wikimedia.org> committed R1903:de8fdf4c1642: Update patch set 1 (authored by aaron).
Update patch set 1
Mon, Jun 11, 10:12 AM
Gerrit Code Review <gerrit@wikimedia.org> committed R1903:ea72a40e577a: Update patch set 1 (authored by aaron).
Update patch set 1
Mon, Jun 11, 10:12 AM
Gerrit Code Review <gerrit@wikimedia.org> committed R1903:a7297307e48a: Update patch set 1 (authored by aaron).
Update patch set 1
Mon, Jun 11, 10:12 AM
Gerrit Code Review <gerrit@wikimedia.org> committed R1903:5ce7d4b4deed: Update patch set 1 (authored by aaron).
Update patch set 1
Mon, Jun 11, 10:12 AM
Gerrit Code Review <gerrit@wikimedia.org> committed R1903:535348fadd8c: Update patch set 1 (authored by aaron).
Update patch set 1
Mon, Jun 11, 10:12 AM
Gerrit Code Review <gerrit@wikimedia.org> committed R1903:46b1b46ae70b: Update patch set 1 (authored by aaron).
Update patch set 1
Mon, Jun 11, 10:12 AM
Gerrit Code Review <gerrit@wikimedia.org> committed R1903:6ef32504bdf5: Update patch set 8 (authored by aaron).
Update patch set 8
Mon, Jun 11, 10:12 AM
Gerrit Code Review <gerrit@wikimedia.org> committed R1903:3943bb8f114a: Update patch set 2 (authored by aaron).
Update patch set 2
Mon, Jun 11, 10:12 AM
Gerrit Code Review <gerrit@wikimedia.org> committed R1903:522c32892173: Update patch set 1 (authored by aaron).
Update patch set 1
Mon, Jun 11, 10:12 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rERXBc4903c9983d4: Create change (authored by aaron).
Create change
Mon, Jun 11, 4:50 AM

Sun, Jun 10

Gerrit Code Review <gerrit@wikimedia.org> committed rEXFA2c26ac76c0f6: Update patch set 2 (authored by aaron).
Update patch set 2
Sun, Jun 10, 8:41 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1981:9789d4ad8f6c: Update patch set 7 (authored by aaron).
Update patch set 7
Sun, Jun 10, 1:53 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1981:b86970b7f5c3: Update patch set 7 (authored by aaron).
Update patch set 7
Sun, Jun 10, 1:53 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1981:008640641dbe: Update patch set 7 (authored by aaron).
Update patch set 7
Sun, Jun 10, 1:53 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1981:3075b8ee66b7: Update patch set 6 (authored by aaron).
Update patch set 6
Sun, Jun 10, 1:53 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1981:a876e64f0744: Update patch set 5 (authored by aaron).
Update patch set 5
Sun, Jun 10, 1:53 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1981:ae628d9766fb: Update patch set 4 (authored by aaron).
Update patch set 4
Sun, Jun 10, 1:53 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1981:f0bce1c7e60f: Update patch set 3 (authored by aaron).
Update patch set 3
Sun, Jun 10, 1:53 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1981:5b940d303635: Update patch set 2 (authored by aaron).
Update patch set 2
Sun, Jun 10, 1:53 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1981:763d6317572f: Update patch set 1 (authored by aaron).
Update patch set 1
Sun, Jun 10, 1:52 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1981:71d1f55af876: Create change (authored by aaron).
Create change
Sun, Jun 10, 1:51 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1984:cadaa6868d1b: Update patch set 2 (authored by aaron).
Update patch set 2
Sun, Jun 10, 1:40 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1984:8583b92ecd74: Update patch set 2 (authored by aaron).
Update patch set 2
Sun, Jun 10, 1:40 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1984:ec9bda826f28: Update patch set 2 (authored by aaron).
Update patch set 2
Sun, Jun 10, 1:40 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1984:ab5f8ee63b6e: Update patch set 2 (authored by aaron).
Update patch set 2
Sun, Jun 10, 1:40 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1984:3051fffb6a40: Update patch set 1 (authored by aaron).
Update patch set 1
Sun, Jun 10, 1:40 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1985:31b677fad2bf: Update patch set 2 (authored by aaron).
Update patch set 2
Sun, Jun 10, 1:23 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1985:674b8abf27a9: Update patch set 2 (authored by aaron).
Update patch set 2
Sun, Jun 10, 1:23 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1985:51295a01dbc1: Update patch set 1 (authored by aaron).
Update patch set 1
Sun, Jun 10, 1:23 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1985:bc9639efc61b: Update patch set 2 (authored by aaron).
Update patch set 2
Sun, Jun 10, 1:23 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1985:3a658b7031d4: Update patch set 1 (authored by aaron).
Update patch set 1
Sun, Jun 10, 1:22 PM
Gerrit Code Review <gerrit@wikimedia.org> committed R1985:22fd9c74bf16: Update patch set 1 (authored by aaron).
Update patch set 1
Sun, Jun 10, 1:22 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rESCC3da145b92ba9: Update patch set 1 (authored by aaron).
Update patch set 1
Sun, Jun 10, 7:18 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rELGNf85353857a17: Update patch set 7 (authored by aaron).
Update patch set 7
Sun, Jun 10, 2:42 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rELGNc69ab56a1290: Update patch set 6 (authored by aaron).
Update patch set 6
Sun, Jun 10, 2:42 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rELGN1d6c1478f9b4: Update patch set 5 (authored by aaron).
Update patch set 5
Sun, Jun 10, 2:42 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rELGNfd4688d2ef6c: Update patch set 1 (authored by aaron).
Update patch set 1
Sun, Jun 10, 2:42 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rELGNa39370bd43a1: Update patch set 2 (authored by aaron).
Update patch set 2
Sun, Jun 10, 2:41 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rELGN8df4cbb41c35: Create patch set 2 (authored by aaron).
Create patch set 2
Sun, Jun 10, 2:41 AM
Gerrit Code Review <gerrit@wikimedia.org> committed rELGNd746b5a9dbf2: Update patch set 1 (authored by aaron).
Update patch set 1
Sun, Jun 10, 2:41 AM

Sat, Jun 9

Gerrit Code Review <gerrit@wikimedia.org> committed rECOG752e3437fbfa: Update patch set 2 (authored by aaron).
Update patch set 2
Sat, Jun 9, 12:02 AM

Fri, Jun 8

Gerrit Code Review <gerrit@wikimedia.org> committed rECOG433df26f08a6: Update patch set 1 (authored by aaron).
Update patch set 1
Fri, Jun 8, 11:56 PM

Wed, Jun 6

aaron added a comment to T91820: Create HTTP verb and sticky cookie DC routing in VCL .

Special:Userlogin starts a session on a GET request so that it can implement CSRF protection on the login form. And that's a localised page name so it's not easy to filter in VCL unless we change the URL in MediaWiki to something more predictable. Not sure if there are other cases, we'd need some sort of audit. If session access is very rare in the secondary DC then could we just tunnel session access to the primary DC, instead of replicating? GET requests causing session creation would be slightly delayed, then the user would get their session cookie and be directed to the primary for subsequent requests.

Wed, Jun 6, 6:47 AM · Services (watching), Wikimania-Hackathon-2018, Availability (MediaWiki-MultiDC), Operations, Traffic

Tue, Jun 5

aaron added a comment to T196303: High rate of "Memcached error .. CONNECTION FAILURE" on snapshot hosts.

From T196125: get* and getMulti* commands now take the Memcached::GET_EXTENDED flag to retrieve user flags and cas tokens: I guess this might be it. There is indeed no cas token returned for the get; thus, no cas token is supplied with the cas command itself; twemproxy appears to expect (require) such a token, though I am not certain of that; see https://github.com/twitter/twemproxy/blob/master/src/proto/nc_memcache.c#L474

Maybe we can get @aaron to verify this.

Tue, Jun 5, 10:11 PM · Wikimedia-log-errors, Operations, Dumps-Generation

Mon, Jun 4

aaron moved T196125: php-memcached 3.0 (PHP 7) incompatible with BagOStuff from Inbox to Doing on the Performance-Team board.
Mon, Jun 4, 7:58 PM · MW-1.30-release-notes, MW-1.31-release-notes, MW-1.29-release-notes, MW-1.27-release-notes, MW-1.31-release, MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), Performance-Team, PHP 7.0 support, MediaWiki-Platform-Team, Operations
aaron claimed T196125: php-memcached 3.0 (PHP 7) incompatible with BagOStuff.
Mon, Jun 4, 7:57 PM · MW-1.30-release-notes, MW-1.31-release-notes, MW-1.29-release-notes, MW-1.27-release-notes, MW-1.31-release, MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), Performance-Team, PHP 7.0 support, MediaWiki-Platform-Team, Operations

Fri, Jun 1

aaron awarded T192771: mcrouter production architecture a Orange Medal token.
Fri, Jun 1, 9:26 PM · User-Joe, Patch-For-Review, Performance-Team (Radar), Availability (MediaWiki-MultiDC), Operations

Tue, May 29

aaron claimed T187951: Intermittent "Error loading data from server" error using VE on officewiki.
Tue, May 29, 8:51 PM · Performance-Team, VisualEditor (Current work)
aaron moved T187951: Intermittent "Error loading data from server" error using VE on officewiki from Inbox to Doing on the Performance-Team board.
Tue, May 29, 8:51 PM · Performance-Team, VisualEditor (Current work)

Thu, May 24

aaron added a comment to T194403: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index..

I suspect that some RESTBase service forwards a user's cookies (for permissions) but uses a local IP, judging from the logs. Since the CP position redis key is based on the client IP/agent hash, then it will not be found and will timeout. I don't know if the agent is passed through or not.

Thu, May 24, 6:31 PM · MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), Release-Engineering-Team (Watching / External), Patch-For-Review, Performance-Team, MediaWiki-Database, Wikimedia-log-errors

Wed, May 23

aaron added a comment to T192771: mcrouter production architecture.

The reason of the hybrid proxy approach is that mcrouter is known to use a non-insignificant amount of memory when under write pressure, so I wanted to avoid sharing the same machines as memcached itself.

We can for sure think of having global proxies in both datacenters in the future, but we can reassess at a later time.

Wed, May 23, 9:25 PM · User-Joe, Patch-For-Review, Performance-Team (Radar), Availability (MediaWiki-MultiDC), Operations

May 23 2018

aaron added a comment to T192771: mcrouter production architecture.

If SET/DELETE go to all mc* servers in the wancache-(eqiad/codfw) pools (as mediawiki_wancache is configured to do in puppet), then Option B would still work since the consistent hashing wouldn't matter. Having broadcasted operations go to all mc* servers rather than just 1-per-DC (based on hash) is not required for WANCache though. Keeping it this way wouldn't scale well if the rate of those (purge) operations increased hugely for some reason. I do like the conceptual simplicity though.

May 23 2018, 2:54 AM · User-Joe, Patch-For-Review, Performance-Team (Radar), Availability (MediaWiki-MultiDC), Operations

May 14 2018

aaron added a comment to T190082: 5-second latency for certain API calls?.

Per above patch, I've also uncovered a condition in our app where it can potentially send cookies that are expired. Since the lifetime of the cpPosIndex cookie seems to be quite short, this issue is especially applicable to this cookie, and may in fact be the root cause here...

May 14 2018, 11:58 PM · MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), Wikipedia-Android-App-Backlog (Android-app-release-v2.7.24x-I-Ice-lolly), Patch-For-Review, MediaWiki-Database, Performance-Team, Android-app-Bugs

May 10 2018

aaron added a comment to T194403: Wikimedia\Rdbms\ChronologyProtector::initPositions: expected but failed to find position index..

The logging level went from INFO to WARNING. I suppose this has been happening for a very long time then.

May 10 2018, 10:21 PM · MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), Release-Engineering-Team (Watching / External), Patch-For-Review, Performance-Team, MediaWiki-Database, Wikimedia-log-errors

May 8 2018

aaron removed projects from T97562: WANObjectCache relay daemon or mcrouter support: Analytics, User-mobrovac, EventBus.
May 8 2018, 10:48 PM · MediaWiki-Database, Performance-Team, Services (watching), Availability (MediaWiki-MultiDC)
aaron triaged T194225: Enable mcrouter on the memcached servers themselves as Normal priority.
May 8 2018, 10:48 PM · Patch-For-Review, Performance-Team
aaron added a comment to T193008: MediaWiki\MediaWikiServices::resetChildProcessServices doesn't reset database connection state.

LBFactory does not implement DestructibleService, though it has a destroy() method.

LBFactory used to implement DestructibleService, see https://gerrit.wikimedia.org/r/c/286314/. Was that changed in order to move LBFactory to /libs? Supporting fork was indeed a driving factor behind introducing service resets, and resetting LBFactory was the primary need.

LBFactory could again implement DestructibleService, if we moved includes/services to libs as well. I don't see anything there that depends on MediaWiki. All it would take would be to change the namespace from MediaWiki\Services to Wikimedia\Services (or perhaps more appropriately, Wikimedia\ServiceContainer).

May 8 2018, 8:00 AM · MediaWiki-ServiceContainer, User-Nikerabbit, User-Daniel, MediaWiki-Platform-Team, MediaWiki-extensions-Translate, MediaWiki-Database, Regression

May 2 2018

aaron added a comment to T193008: MediaWiki\MediaWikiServices::resetChildProcessServices doesn't reset database connection state.

Correct usage of ForkController (which has logic that ttmserver-export is mostly doing) works fine.

May 2 2018, 6:56 PM · MediaWiki-ServiceContainer, User-Nikerabbit, User-Daniel, MediaWiki-Platform-Team, MediaWiki-extensions-Translate, MediaWiki-Database, Regression
aaron added a comment to T193008: MediaWiki\MediaWikiServices::resetChildProcessServices doesn't reset database connection state.

Why doesn't that script use ForkController btw?

May 2 2018, 6:30 PM · MediaWiki-ServiceContainer, User-Nikerabbit, User-Daniel, MediaWiki-Platform-Team, MediaWiki-extensions-Translate, MediaWiki-Database, Regression
aaron added a comment to T193008: MediaWiki\MediaWikiServices::resetChildProcessServices doesn't reset database connection state.

LBFactory does not implement DestructibleService, though it has a destroy() method. This is due to it being in /libs. It relies on reference counting, where the old service container instance falls out of scope in resetGlobalInstance() with $oldInstance dying, then LBFactory following suite and triggering __destruct()=>destroy(), and so on. If something has a ServiceContainer instance (with LBFactory loaded) pre-fork and tries to use it later it will get ContainerDisabledException.

May 2 2018, 6:24 PM · MediaWiki-ServiceContainer, User-Nikerabbit, User-Daniel, MediaWiki-Platform-Team, MediaWiki-extensions-Translate, MediaWiki-Database, Regression

May 1 2018

aaron created T193565: Rare "Table 'centralauth.page' doesn't exist" errors.
May 1 2018, 8:05 PM · Wikimedia-log-errors, MediaWiki-Database

Apr 30 2018

aaron closed T151466: Performance Q2 2017/18 goal: Install and use mcrouter in deployment-prep, a subtask of T175213: 2017/18 Annual Plan Program 8: Multi-datacenter support, Q2 goals, as Resolved.
Apr 30 2018, 7:57 PM · MediaWiki-Platform-Team, Performance-Team (Radar), Epic, Operations, Services (watching)
aaron closed T151466: Performance Q2 2017/18 goal: Install and use mcrouter in deployment-prep as Resolved.

This has now been running for a while (since Apr 17) with the new packages (both debian versions, though the stretch server isn't there anymore afaik).

Apr 30 2018, 7:57 PM · Release-Engineering-Team (Watching / External), Availability (MediaWiki-MultiDC), Beta-Cluster-Infrastructure, Performance-Team

Apr 27 2018

aaron updated the task description for T193271: Handle large MessageCache key values in memcached.
Apr 27 2018, 8:35 PM · MediaWiki-Cache, Performance-Team
aaron created T193271: Handle large MessageCache key values in memcached.
Apr 27 2018, 8:11 PM · MediaWiki-Cache, Performance-Team
aaron added a comment to T192473: deployment-prep has jobqueue issues.

refreshLinks2 is not used anymore. Since it is not in $wgJobClasses anymore, they probably won't get cleared in recycleAndDeleteStaleJobs().

Apr 27 2018, 3:33 AM · Services, Release-Engineering-Team, MW-1.32-release-notes (WMF-deploy-2018-04-24 (1.32.0-wmf.1)), Patch-For-Review, Puppet, Beta-Cluster-Infrastructure

Apr 23 2018

aaron closed T160910: File moves throw error when using postgres as Resolved.
Apr 23 2018, 10:10 PM · Performance-Team, MW-1.32-release-notes (WMF-deploy-2018-04-24 (1.32.0-wmf.1)), MW-1.31-release-notes (WMF-deploy-2018-04-17 (1.31.0-wmf.30)), MediaWiki-Database, MediaWiki-File-management, PostgreSQL, Multimedia

Apr 19 2018

aaron added a comment to T192584: Error occurs in file page for Own uploaded files@1.31.0-wmf.30 (e8360e8).

I assume daf0514345f03 exposed this bug.

Apr 19 2018, 7:51 PM · MediaWiki-Database, MW-1.27-release-notes, MW-1.30-release-notes, MW-1.29-release-notes, MW-1.31-release-notes, MW-1.32-release-notes (WMF-deploy-2018-04-24 (1.32.0-wmf.1))
aaron added a comment to T192473: deployment-prep has jobqueue issues.

The warnings are pointless, the patch above adds an isset() check.

Apr 19 2018, 4:42 AM · Services, Release-Engineering-Team, MW-1.32-release-notes (WMF-deploy-2018-04-24 (1.32.0-wmf.1)), Patch-For-Review, Puppet, Beta-Cluster-Infrastructure

Apr 12 2018

aaron added a comment to T191802: [Epic] Determine a strategy to store files between 5 and 100 Gb.

This is related to T149847 in that we would *have* to stop moving file content around in Special:MovePage just to rename files.

Apr 12 2018, 5:35 PM · media-storage, Multimedia
aaron added a parent task for T149847: RFC: Use content hash based image / thumb URLs: T191802: [Epic] Determine a strategy to store files between 5 and 100 Gb.
Apr 12 2018, 5:35 PM · Services (later), Traffic, Operations, TechCom-RFC, Zero, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, Reading-Admin, Commons, Epic, RESTBase-API, Parsoid, Multimedia, MediaWiki-File-management
aaron added a subtask for T191802: [Epic] Determine a strategy to store files between 5 and 100 Gb: T149847: RFC: Use content hash based image / thumb URLs.
Apr 12 2018, 5:34 PM · media-storage, Multimedia

Apr 11 2018

aaron added a comment to T191916: Warning: Destructor threw an object exception: exception 'Wikimedia\Rdbms\DBUnexpectedError' with message 'Wikimedia\Rdbms\Database::close: Expected mass commit of all peer transactions (DBO_TRX set).' in /srv/mediawiki/php-1.31.0-wmf.29/includes/libs/rdbms/database/Database.php:3602.

I suspect the transactions are just empty ones with SELECT statements, which don't need to give errors here.

Apr 11 2018, 5:28 AM · MW-1.31-release-notes (WMF-deploy-2018-04-10 (1.31.0-wmf.29)), Performance-Team, MediaWiki-Database, Patch-For-Review, Wikimedia-log-errors

Apr 10 2018

aaron added a comment to T175834: TranslatablePageMoveJob commit while in atomic sections.

The message index code could do for a large amount of rework. In the meantime, I can't tell why the MessageIndexRebuildJob::newJob() instance must run immediately in isValid()...it's not like the method recheck's what it did before after the rebuild. If nothing else depends on it being immediate, then it should use a DeferredUpdate. If it has to be immediate...then CONN_TRX_AUTO can be considered (as long as it doesn't deadlock by having to transactions updating the same rows).

Apr 10 2018, 11:13 PM · MW-1.32-release-notes (WMF-deploy-2018-05-01 (1.32.0-wmf.2)), Language-2018-Apr-June, Language-Team, MediaWiki-extensions-Translate, Wikimedia-log-errors

Apr 9 2018

aaron added a subtask for T151466: Performance Q2 2017/18 goal: Install and use mcrouter in deployment-prep: T190979: build new version of mcrouter package.
Apr 9 2018, 6:52 PM · Release-Engineering-Team (Watching / External), Availability (MediaWiki-MultiDC), Beta-Cluster-Infrastructure, Performance-Team
aaron added a parent task for T190979: build new version of mcrouter package: T151466: Performance Q2 2017/18 goal: Install and use mcrouter in deployment-prep.
Apr 9 2018, 6:52 PM · Patch-For-Review, User-Joe, Operations

Apr 4 2018

aaron added a comment to T190960: 1.31.0-wmf.27 rolled back due to increase in fatals: "Replication wait failed: lost connection to MySQL server during query".

@aaron - if you don't like the current model, should we think an alternative, simpler one based on the heartbeat table- or is this still ok for you?

Apr 4 2018, 9:08 PM · Wikimedia-Incident, MW-1.31-release-notes (WMF-deploy-2018-03-27 (1.31.0-wmf.27)), User-notice, Patch-For-Review, DBA, Wikimedia-log-errors