Page MenuHomePhabricator

aaron (Aaron Schulz)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Oct 20 2014, 5:25 PM (342 w, 3 d)
Availability
Available
IRC Nick
AaronSchulz
LDAP User
Aaron Schulz
MediaWiki User
Aaron Schulz [ Global Accounts ]

Recent Activity

Wed, May 5

aaron added a comment to T269161: Disallow direct "BEGIN"/"COMMIT"/"ROLLBACK" via Database::query().

Note that sql.php should still work.

Wed, May 5, 10:35 PM · Performance-Team, Wikimedia-Rdbms, Platform Engineering
aaron placed T266904: Performance review of ext:StopForumSpam up for grabs.

Unassigned, unless there is a clear maintainer to review that patch and do any future upkseep.

Wed, May 5, 10:35 PM · Patch-For-Review, user-sbassett, Performance-Team

Tue, May 4

aaron committed rEOAR945a560ba279: Simplify upsert() call by removing unreferenced AUTOINCREMENT column (authored by aaron).
Simplify upsert() call by removing unreferenced AUTOINCREMENT column
Tue, May 4, 7:42 PM

Thu, Apr 29

aaron closed T150506: Avoid lazyImportLocalNames() master writes on GET requests (Run a script to backfill them once for all) as Resolved.

It finished.

Thu, Apr 29, 5:40 PM · MW-1.37-notes (1.37.0-wmf.4; 2021-05-04), MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), User-Majavah, Platform Team Workboards (Clinic Duty Team), Performance-Team (Radar), Sustainability (MediaWiki-MultiDC), MediaWiki-extensions-CentralAuth
aaron closed T150506: Avoid lazyImportLocalNames() master writes on GET requests (Run a script to backfill them once for all), a subtask of T154552: ApiLogin should not open master connection to centralauth DB, as Resolved.
Thu, Apr 29, 5:40 PM · Sustainability (MediaWiki-MultiDC), MediaWiki-Authentication-and-authorization, MediaWiki-extensions-CentralAuth
aaron added a comment to T278392: Storage solution for cross-datacenter tokens.

Note that the backing store can be moved again later on, making it easy to use mcrouter first.

Thu, Apr 29, 1:59 AM · Patch-For-Review, MediaWiki-extensions-OAuth, ConfirmEdit (CAPTCHA extension), MediaWiki-extensions-CentralAuth, Sustainability (MediaWiki-MultiDC), Performance-Team

Wed, Apr 28

aaron added a comment to T150506: Avoid lazyImportLocalNames() master writes on GET requests (Run a script to backfill them once for all).

Running it again since the screen is gone...

Wed, Apr 28, 2:33 PM · MW-1.37-notes (1.37.0-wmf.4; 2021-05-04), MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), User-Majavah, Platform Team Workboards (Clinic Duty Team), Performance-Team (Radar), Sustainability (MediaWiki-MultiDC), MediaWiki-extensions-CentralAuth

Tue, Apr 27

aaron renamed T274174: Add a write timestamp field and flags field to objectcache table from Add a write timestamp field to objectcache table to Add a write timestamp field and flags field to objectcache table.
Tue, Apr 27, 5:53 PM · MW-1.37-notes (1.37.0-wmf.5; 2021-05-11), Patch-For-Review, Performance-Team, MediaWiki-Cache

Mon, Apr 19

aaron added a comment to T90875: Convert tests/phpunit/phpunit.php entrypoint to plain PHPUnit with bootstrap file.

Using env vars for now seems OK for now.

Mon, Apr 19, 6:33 PM · Performance-Team (Radar), Patch-For-Review, User-kostajh, Code-Health-Metrics, Technical-Debt, MediaWiki-Core-Tests

Wed, Apr 14

aaron added a comment to T90875: Convert tests/phpunit/phpunit.php entrypoint to plain PHPUnit with bootstrap file.

There could be a FileBackendTestBase with subclasses for each backend. The "proxy" backend classes (FileBackendMultiWrite) could just use MemoryFileBackend instances. The tests for MemoryFileBackend would not need any config. The FSFileBackend subclass could just use the tmp directory. The other FileBackendStore subclass would need site config pointing to a real backend...

Wed, Apr 14, 9:50 PM · Performance-Team (Radar), Patch-For-Review, User-kostajh, Code-Health-Metrics, Technical-Debt, MediaWiki-Core-Tests

Apr 13 2021

aaron created T280071: WANObjectCache calls to setNewPreparedValues() cannot have serialized values reused.
Apr 13 2021, 7:44 PM · MediaWiki-Cache, Performance-Team

Apr 9 2021

aaron added a comment to T277834: Two-level session storage and the consistency problem with serialized blob stores.

Possibly related is https://gerrit.wikimedia.org/r/c/mediawiki/core/+/659617 (Last-Write-Wins updates for subkeys within a key).

Apr 9 2021, 7:04 PM · Performance-Team (Radar), MediaWiki-Authentication-and-authorization

Apr 5 2021

aaron committed rELGN012f47bf328c: Switch checkAndIncKey() to using BagOStuff::incrWithInit() (authored by aaron).
Switch checkAndIncKey() to using BagOStuff::incrWithInit()
Apr 5 2021, 1:07 AM

Apr 2 2021

aaron added a comment to T235554: MediaWiki::outputResponsePayload seemingly causes net::ERR_HTTP2_PROTOCOL_ERROR 200 and compression issues in 1.35.

The headers_sent() checks should handle those, though maybe something is checked in the wrong place.

Apr 2 2021, 11:21 PM · MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), MW-1.35-notes, MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, wbstack, MW-1.35-release, Regression, MediaWiki-General, Performance-Team, Anti-Harassment, Cloud-VPS

Mar 30 2021

aaron closed T235554: MediaWiki::outputResponsePayload seemingly causes net::ERR_HTTP2_PROTOCOL_ERROR 200 and compression issues in 1.35 as Resolved.

Closing given the updates to git master and REL1_35

Mar 30 2021, 2:24 PM · MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), MW-1.35-notes, MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, wbstack, MW-1.35-release, Regression, MediaWiki-General, Performance-Team, Anti-Harassment, Cloud-VPS
aaron closed T235554: MediaWiki::outputResponsePayload seemingly causes net::ERR_HTTP2_PROTOCOL_ERROR 200 and compression issues in 1.35, a subtask of T269516: Content-Encoding set to none/identity after upgrade to 1.35, as Resolved.
Mar 30 2021, 2:23 PM · Performance-Team (Radar), MW-1.35-release, MediaWiki-General

Mar 25 2021

aaron added a comment to T120242: Consistent MediaWiki state change events | MediaWiki events as source of truth.

Keep in mind that, strictly speaking, some of these problems are not even solved in MediaWiki for "core" DB shards. These include the "main" s[1-8] shards and the "extension" x1 shard. For example, a web request might update an S1 (enwiki) and S7 (centralauth) in one "transaction round", which just means that each of the relevant DB connections are checked for connectivity (pinged if there was no activity < 1 sec ago), and, after that passes, then the COMMITs are made in rapid succession. It is still possible, though very unlikely, that a proper subset of the transactions fail. Also, some events might be triggered from onTransaction() callbacks or PRESEND deferred updates.

Mar 25 2021, 11:23 PM · DBA, WMF-Architecture-Team, Platform Team Legacy (Later), Event-Platform, Analytics, Services (later)
aaron closed T95501: Fix causes of replica lag and get it to under 5 seconds at peak, a subtask of T3268: Database replication lag issues (tracking), as Resolved.
Mar 25 2021, 12:20 AM · Wikimedia-Rdbms, DBA, Tracking-Neverending
aaron closed T95501: Fix causes of replica lag and get it to under 5 seconds at peak, a subtask of T108551: Database locked error while publishing article using CX, as Resolved.
Mar 25 2021, 12:20 AM · WorkType-Maintenance, ContentTranslation
aaron closed T95501: Fix causes of replica lag and get it to under 5 seconds at peak as Resolved.
Mar 25 2021, 12:20 AM · Performance-Team, Goal, Sustainability

Mar 24 2021

aaron created T278392: Storage solution for cross-datacenter tokens.
Mar 24 2021, 10:56 PM · Patch-For-Review, MediaWiki-extensions-OAuth, ConfirmEdit (CAPTCHA extension), MediaWiki-extensions-CentralAuth, Sustainability (MediaWiki-MultiDC), Performance-Team

Mar 22 2021

aaron added a comment to T266904: Performance review of ext:StopForumSpam.

I rebased the patch above. Once this is merged, I can consider this task closed.

Mar 22 2021, 11:39 PM · Patch-For-Review, user-sbassett, Performance-Team
aaron added a comment to T150506: Avoid lazyImportLocalNames() master writes on GET requests (Run a script to backfill them once for all).

I finished running this on labs via:

Mar 22 2021, 11:38 PM · MW-1.37-notes (1.37.0-wmf.4; 2021-05-04), MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), User-Majavah, Platform Team Workboards (Clinic Duty Team), Performance-Team (Radar), Sustainability (MediaWiki-MultiDC), MediaWiki-extensions-CentralAuth

Mar 19 2021

TiltedCerebellum awarded T235554: MediaWiki::outputResponsePayload seemingly causes net::ERR_HTTP2_PROTOCOL_ERROR 200 and compression issues in 1.35 a The World Burns token.
Mar 19 2021, 5:14 AM · MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), MW-1.35-notes, MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, wbstack, MW-1.35-release, Regression, MediaWiki-General, Performance-Team, Anti-Harassment, Cloud-VPS

Mar 15 2021

aaron created T277504: Add flag to support WRITE_ALLOW_SEGMENTS in WANCache.
Mar 15 2021, 7:42 PM · MediaWiki-Cache, Performance-Team
Daimona awarded T269894: Include paratest runner with stock MediaWiki a Yellow Medal token.
Mar 15 2021, 6:11 PM · MediaWiki-Core-Tests, Performance-Team

Mar 13 2021

aaron added a comment to T266904: Performance review of ext:StopForumSpam.

While we gzip memcached values, it would still be larger than the 1 mb limit. Even if I let WANCache use BagOStuff::WRITE_SEGMENTABLE, that is still a lot of I/O (even with "pcTTL" enabled).

Mar 13 2021, 6:44 AM · Patch-For-Review, user-sbassett, Performance-Team

Mar 12 2021

aaron added a comment to T277056: TypeError: Argument 1 passed to Wikimedia\Rdbms\TransactionProfiler::recordConnection() must be of the type string, null given.

The second one alone should be enough for a quick fix.

Mar 12 2021, 11:21 PM · MW-1.36-notes (1.36.0-wmf.35; 2021-03-16), Performance-Team, Platform Engineering, MediaWiki-General, Wikimedia-Rdbms

Mar 11 2021

aaron added a comment to T266904: Performance review of ext:StopForumSpam.

About how large is the IP list that will be stored in cache?

Mar 11 2021, 11:55 PM · Patch-For-Review, user-sbassett, Performance-Team

Mar 5 2021

aaron added a comment to T133523: Decide how to improve parsercache replication, sharding and HA.

Ideally the SqlBagOStuff hashing would use HashRing, though any naive transition would involve a lot of misses/churn at first.

Mar 5 2021, 12:26 AM · Epic, Sustainability (Incident Followup), Performance-Team (Radar), SRE, DBA
aaron added a comment to T133523: Decide how to improve parsercache replication, sharding and HA.

Playing around with

mwscript shell.php aawiki

...I noticed that SHOW SLAVE STATUS is empty in eqiad for the 'pc3' slot server. Both have SHOW MASTER STATUS output and read_only = 0. Any reason the eqiad DBs are not listening to the codfw DB binlogs?

Mar 5 2021, 12:14 AM · Epic, Sustainability (Incident Followup), Performance-Team (Radar), SRE, DBA

Feb 26 2021

aaron added a comment to T246594: Prevent use of known buggy versions of PHP (that are greater than the minimum supported PHP version) (7.4.0 – 7.4.8, and 7.3.0 - 7.3.18).

Um @aaron... why are you changing the title back to blacklisting?

Feb 26 2021, 11:04 PM · MediaWiki-General

Feb 25 2021

aaron renamed T246594: Prevent use of known buggy versions of PHP (that are greater than the minimum supported PHP version) (7.4.0 – 7.4.8, and 7.3.0 - 7.3.18) from Blocklisting of newer PHP versions that have been patched already to Blacklisting of newer PHP versions that have been patched already.
Feb 25 2021, 10:37 PM · MediaWiki-General

Feb 23 2021

aaron added a comment to T254210: ParameterAssertionException "Bad value for parameter $row->rev_timestamp" from RevisionStoreRecord.php.

I see two places in ConvertibleTimestamp.php that recast generic "Exception" errors into "TimestampException", which would make convert() return false, which could cause this problem.

Feb 23 2021, 9:21 PM · MW-1.36-notes (1.36.0-wmf.38; 2021-04-06), Wikimedia-Timestamp, MW-1.35-notes (1.35.0-wmf.36; 2020-06-09), Platform Team Workboards (Clinic Duty Team), MediaWiki-Revision-backend, Wikimedia-production-error

Feb 22 2021

aaron closed T235705: Add BagOStuff metrics for read/write operations (bytes, key "class") grouped by type/ID as Resolved.
Feb 22 2021, 7:23 PM · MW-1.36-notes (1.36.0-wmf.28; 2021-01-26), MediaWiki-Cache, Performance-Team
aaron updated the task description for T235705: Add BagOStuff metrics for read/write operations (bytes, key "class") grouped by type/ID.
Feb 22 2021, 7:23 PM · MW-1.36-notes (1.36.0-wmf.28; 2021-01-26), MediaWiki-Cache, Performance-Team
aaron closed T254608: Monitor bytes written and read via WANObjectCache by key group, a subtask of T244852: Upgrade and improve our application object caching service (memcached), as Resolved.
Feb 22 2021, 7:14 PM · Patch-For-Review, SRE, serviceops
aaron closed T254608: Monitor bytes written and read via WANObjectCache by key group, a subtask of T235705: Add BagOStuff metrics for read/write operations (bytes, key "class") grouped by type/ID, as Resolved.
Feb 22 2021, 7:14 PM · MW-1.36-notes (1.36.0-wmf.28; 2021-01-26), MediaWiki-Cache, Performance-Team
aaron closed T254608: Monitor bytes written and read via WANObjectCache by key group as Resolved.
Feb 22 2021, 7:14 PM · MW-1.36-notes (1.36.0-wmf.27; 2021-01-19), Patch-For-Review, observability, Sustainability (Incident Followup), Performance-Team, MediaWiki-Cache
aaron updated the task description for T254608: Monitor bytes written and read via WANObjectCache by key group.
Feb 22 2021, 7:14 PM · MW-1.36-notes (1.36.0-wmf.27; 2021-01-19), Patch-For-Review, observability, Sustainability (Incident Followup), Performance-Team, MediaWiki-Cache

Feb 19 2021

aaron updated the task description for T254634: Determine and implement multi-dc strategy for ChronologyProtector.
Feb 19 2021, 10:51 PM · MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), Patch-For-Review, User-jijiki, Sustainability (MediaWiki-MultiDC), Performance-Team, serviceops, Platform Engineering, Wikimedia-Rdbms
aaron updated the task description for T254634: Determine and implement multi-dc strategy for ChronologyProtector.
Feb 19 2021, 5:22 AM · MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), Patch-For-Review, User-jijiki, Sustainability (MediaWiki-MultiDC), Performance-Team, serviceops, Platform Engineering, Wikimedia-Rdbms

Feb 16 2021

aaron updated the task description for T254634: Determine and implement multi-dc strategy for ChronologyProtector.
Feb 16 2021, 7:53 PM · MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), Patch-For-Review, User-jijiki, Sustainability (MediaWiki-MultiDC), Performance-Team, serviceops, Platform Engineering, Wikimedia-Rdbms
aaron updated the task description for T91820: Create HTTP verb and sticky cookie DC routing in VCL .
Feb 16 2021, 4:49 AM · Patch-For-Review, Services (watching), Wikimania-Hackathon-2018, Sustainability (MediaWiki-MultiDC), SRE, Traffic

Feb 12 2021

aaron added a comment to T269324: Productionize x2 databases.
  • @Krinkle If your team could check what would be the behaviour if we simply depool a host via dbctl? Would that stop writes nicely? Would that break the site? :-)

Thank you

Feb 12 2021, 12:15 AM · Performance-Team (Radar), Patch-For-Review, DBA

Feb 11 2021

aaron added a comment to T269324: Productionize x2 databases.

Thanks @Kormat
@Krinkle @aaron - let's go for the x1 approach but with local masters being writable then?

Feb 11 2021, 11:45 PM · Performance-Team (Radar), Patch-For-Review, DBA
aaron updated the task description for T274455: AbuseFilter DB writes to af_hit_count during action=raw page views.
Feb 11 2021, 12:26 AM · MW-1.36-notes (1.36.0-wmf.33; 2021-03-02), Performance-Team (Radar), Sustainability (MediaWiki-MultiDC), AbuseFilter
aaron added a comment to T274455: AbuseFilter DB writes to af_hit_count during action=raw page views.

Possibly triggered by MediaWiki\Auth\AuthManager->autoCreateUser calls, which I also see in the logging for the same reqId.

Feb 11 2021, 12:03 AM · MW-1.36-notes (1.36.0-wmf.33; 2021-03-02), Performance-Team (Radar), Sustainability (MediaWiki-MultiDC), AbuseFilter

Feb 10 2021

aaron created T274455: AbuseFilter DB writes to af_hit_count during action=raw page views.
Feb 10 2021, 11:58 PM · MW-1.36-notes (1.36.0-wmf.33; 2021-03-02), Performance-Team (Radar), Sustainability (MediaWiki-MultiDC), AbuseFilter
aaron added a comment to T254634: Determine and implement multi-dc strategy for ChronologyProtector.

The only thing that currently updates the replication positions on HTTP GET, that is not an easily spotted entrypoint like rollback/createaccount/login (which can be routed like HTTP POST) are:

  • UpdateHitCountWatcher from AbuseFilter on edit form views (this should use the main stash or least the job queue); does not need chronology-protector
  • CentralAuthUser->lazyImportLocalNames(); this should be solved by the migration script (T150506)
  • SpecialContentTranslation setting global preferences just to store "user has seen X" state; this should use the main stash (or the job queue at least)
  • ShortUrlHooks on page views; this will be fixed by T256993
Feb 10 2021, 11:29 PM · MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), Patch-For-Review, User-jijiki, Sustainability (MediaWiki-MultiDC), Performance-Team, serviceops, Platform Engineering, Wikimedia-Rdbms

Feb 8 2021

aaron created T274174: Add a write timestamp field and flags field to objectcache table.
Feb 8 2021, 6:54 PM · MW-1.37-notes (1.37.0-wmf.5; 2021-05-11), Patch-For-Review, Performance-Team, MediaWiki-Cache

Feb 4 2021

aaron added a comment to T235554: MediaWiki::outputResponsePayload seemingly causes net::ERR_HTTP2_PROTOCOL_ERROR 200 and compression issues in 1.35.

I wonder what output_buffering value is being used in php.ini here. If I set it to off (not my distro default), I can trigger the MW_SETUP_CALLBACK use of ob_start()/OutputHandler::handle(). I definitely see some mismatch between the logic of OutputHandler::handle and MediaWiki::outputResponsePayload.

Feb 4 2021, 8:01 AM · MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), MW-1.35-notes, MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, wbstack, MW-1.35-release, Regression, MediaWiki-General, Performance-Team, Anti-Harassment, Cloud-VPS

Jan 29 2021

aaron added a comment to T235705: Add BagOStuff metrics for read/write operations (bytes, key "class") grouped by type/ID.

It appears rMW57325ba3bdce: objectcache: add statsd key metrics to BagOStuff classes is causing notices about undefined indexes:

Notice: Undefined index: wiki:user-quicktouched:id:1 in /vagrant/mediawiki/includes/libs/objectcache/MediumSpecificBagOStuff.php on line 1083
(10.0.2.15)
Notice: Undefined index: wiki:user-quicktouched:id:1 in /vagrant/mediawiki/includes/libs/objectcache/MediumSpecificBagOStuff.php on line 1098
(10.0.2.15)

I can only reproduce this on MediaWiki-Vagrant, which comes with a Redis setup out of the box. I don't know if this means Vagrant is misconfigured, or the aforementioned change is incomplete.

Jan 29 2021, 11:57 PM · MW-1.36-notes (1.36.0-wmf.28; 2021-01-26), MediaWiki-Cache, Performance-Team
aaron added a comment to T269324: Productionize x2 databases.

This is all done - hosts are ready to start getting data.

I was thinking that these would be setup just like the pcxxxx servers (e.g. each server in eqiad having circular replication with a corresponding server in codfw). Doing so will allow for things like https://phabricator.wikimedia.org/T113916 to proceed, since some uses cases involve writes that can happen in either datacenter. The MediaWiki mainstash config will be similar to the parser cache config as well.

That wasn't my understanding when this was discussed at T212129. Also I didn't get that impression when we briefly spoke on IRC a few days ago, I thought we just wanted another x1 with eqiad <-> codfw replication between masters, not 3 more independent replication chains.
Handling parsercache, from an operational point of view, at the moment is really painful (ie: having to commit to MW for a depooling, having to always have 3 lines and just duplicate IPs on the array, having no spares...) and I would like to expand this by 3 more replication chains. It puts certainly lots of overhead when having to operate with them.

There's probably also some puppet work that would need to be done as we simply don't want to just create 3 new pc4, pc5 and pc6 as that would be confusing - I would prefer if @Kormat can estimate how much this could be.

Jan 29 2021, 11:11 PM · Performance-Team (Radar), Patch-For-Review, DBA

Jan 28 2021

aaron added a comment to T269324: Productionize x2 databases.

This is all done - hosts are ready to start getting data.

Jan 28 2021, 9:50 PM · Performance-Team (Radar), Patch-For-Review, DBA

Jan 27 2021

aaron closed T273006: MediumSpecificBagOStuff.php Undefined offset errors, a subtask of T271342: 1.36.0-wmf.28 deployment blockers, as Resolved.
Jan 27 2021, 9:51 PM · User-brennen, Patch-For-Review, Release-Engineering-Team-TODO (2021-01-01 to 2021-03-31 (Q3)), Release, Train Deployments
aaron closed T273006: MediumSpecificBagOStuff.php Undefined offset errors as Resolved.

Logstash no longer shows the errors after the deploy.

Jan 27 2021, 9:51 PM · MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), Platform Team Workboards (Clinic Duty Team), Wikimedia-production-error, MediaWiki-Cache, User-brennen
aaron added a comment to T254210: ParameterAssertionException "Bad value for parameter $row->rev_timestamp" from RevisionStoreRecord.php.

It would be interesting to see if the rate of occurrence changes after T266055 is deployed.

Jan 27 2021, 9:21 PM · MW-1.36-notes (1.36.0-wmf.38; 2021-04-06), Wikimedia-Timestamp, MW-1.35-notes (1.35.0-wmf.36; 2020-06-09), Platform Team Workboards (Clinic Duty Team), MediaWiki-Revision-backend, Wikimedia-production-error
aaron added a comment to T273006: MediumSpecificBagOStuff.php Undefined offset errors.

@aaron, is https://gerrit.wikimedia.org/r/658780 sufficient to take care of this problem or do I need https://gerrit.wikimedia.org/r/c/mediawiki/core/+/658781 as well?

Jan 27 2021, 8:32 PM · MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), Platform Team Workboards (Clinic Duty Team), Wikimedia-production-error, MediaWiki-Cache, User-brennen
aaron added a comment to T273006: MediumSpecificBagOStuff.php Undefined offset errors.

I can repro with

./srv_paratest --filter BagOStuff --use-bagostuff=redis
Jan 27 2021, 3:54 AM · MW-1.36-notes (1.36.0-wmf.29; 2021-02-02), Platform Team Workboards (Clinic Duty Team), Wikimedia-production-error, MediaWiki-Cache, User-brennen

Jan 26 2021

aaron raised the priority of T254608: Monitor bytes written and read via WANObjectCache by key group from Medium to High.
Jan 26 2021, 4:14 AM · MW-1.36-notes (1.36.0-wmf.27; 2021-01-19), Patch-For-Review, observability, Sustainability (Incident Followup), Performance-Team, MediaWiki-Cache
aaron updated the task description for T254608: Monitor bytes written and read via WANObjectCache by key group.
Jan 26 2021, 4:13 AM · MW-1.36-notes (1.36.0-wmf.27; 2021-01-19), Patch-For-Review, observability, Sustainability (Incident Followup), Performance-Team, MediaWiki-Cache
aaron updated the task description for T235705: Add BagOStuff metrics for read/write operations (bytes, key "class") grouped by type/ID.
Jan 26 2021, 4:11 AM · MW-1.36-notes (1.36.0-wmf.28; 2021-01-26), MediaWiki-Cache, Performance-Team
aaron added a comment to T272901: Standardize RevisionRecord/Store use of wikiID vs dbDomain.

It should use DB Domains, which can always be converted to wiki IDs (though not 100% the other way around in some messy legacy edge cases that do not effect WMF). This is also what LoadBalancer has always expected in it's methods.

Jan 26 2021, 12:22 AM · MW-1.36-notes (1.36.0-wmf.30; 2021-02-09), Platform Team Workboards (MW Expedition)

Jan 25 2021

aaron moved T268815: Disable Fresnel hard error for changes in paint timing from Doing (old) to Backlog: Maintenance on the Performance-Team board.
Jan 25 2021, 7:55 PM · Performance-Team, Fresnel
Kghbln awarded T235554: MediaWiki::outputResponsePayload seemingly causes net::ERR_HTTP2_PROTOCOL_ERROR 200 and compression issues in 1.35 a The World Burns token.
Jan 25 2021, 4:45 PM · MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), MW-1.35-notes, MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, wbstack, MW-1.35-release, Regression, MediaWiki-General, Performance-Team, Anti-Harassment, Cloud-VPS

Jan 24 2021

Addshore awarded T235554: MediaWiki::outputResponsePayload seemingly causes net::ERR_HTTP2_PROTOCOL_ERROR 200 and compression issues in 1.35 a The World Burns token.
Jan 24 2021, 2:41 PM · MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), MW-1.35-notes, MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, wbstack, MW-1.35-release, Regression, MediaWiki-General, Performance-Team, Anti-Harassment, Cloud-VPS

Jan 21 2021

Ciencia_Al_Poder awarded T235554: MediaWiki::outputResponsePayload seemingly causes net::ERR_HTTP2_PROTOCOL_ERROR 200 and compression issues in 1.35 a The World Burns token.
Jan 21 2021, 10:09 AM · MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), MW-1.35-notes, MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, wbstack, MW-1.35-release, Regression, MediaWiki-General, Performance-Team, Anti-Harassment, Cloud-VPS
aaron added a comment to T272078: mc1024 broke - replace it or remove it from configs.

@Krinkle @aaron the gutter pool sets a max TTL of 600s to any key with a TTL over 600s, do you think it is fine to keep the gutter-pool substitute the missing server?

Jan 21 2021, 12:07 AM · serviceops

Jan 12 2021

aaron closed T221159: FY18/19 TEC1.6 Q4: Improve or replace the usage of GTID_WAIT with pt-heartbeat in MW as Declined.

I don't think it would be worth using pt-heartbeat for LoadBalancer::waitFor() unless the precision was much higher (likely problematically high in terms of spammy heartbeat table updates).

Jan 12 2021, 5:21 AM · User-Kormat, MW-1.36-notes (1.36.0-wmf.20; 2020-12-01), Patch-For-Review, Performance-Team (Radar), User-mobrovac, Services (watching), Goal, Wikimedia-Rdbms, DBA
aaron closed T221159: FY18/19 TEC1.6 Q4: Improve or replace the usage of GTID_WAIT with pt-heartbeat in MW, a subtask of T88445: MediaWiki active/active datacenter investigation and work (tracking), as Declined.
Jan 12 2021, 5:20 AM · Platform Team Workboards (Initiatives), Platform Team Initiatives (Multi-DC (TEC1)), User-mobrovac, Performance-Team (Radar), Sustainability (MediaWiki-MultiDC), Epic

Jan 7 2021

aaron added a comment to T269326: Create RequestTimeout library.

I'm still thinking about re-adding scoped critical sections, because otherwise every critical section needs to be wrapped in try/finally, which is annoying. Yes a fatal error will be generated if a critical section scope is destroyed during request shutdown, but maybe that is the least bad option.

I wonder if the scope object should have an explicit exit function, to make it a bit more obvious in calling code. So most timeout exceptions will be thrown from the exit() call, not from __destruct().

Jan 7 2021, 1:48 AM · Patch-For-Review, MW-1.36-notes (1.36.0-wmf.31; 2021-02-16), Platform Team Workboards (Clinic Duty Team)

Jan 5 2021

aaron added a comment to T264604: Enable "/*/mw-with-onhost-tier/" route for MediaWiki where safe.

@Krinkle @aaron do you think we are ready to move this forward?

Jan 5 2021, 7:02 PM · MW-1.36-notes, MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), Patch-For-Review, User-jijiki, SRE, serviceops, Performance-Team
aaron added a comment to T270994: Re-deleting a Commons file: "Error deleting file: The file "mwstore://local-multiwrite/local-deleted/..." is in an inconsistent state within the internal storage backends"..

At first, I suspected a timeout causing a failed deferred updated, but it seems that no failure was logged likely due to NullLogger being used by sub-backends of FileBackendMultiWrite.

Jan 5 2021, 12:27 AM · Structured Data Engineering, MediaWiki-File-management, Structured-Data-Backlog, Wikimedia-production-error, Commons, MW-1.36-notes (1.36.0-wmf.28; 2021-01-26), SRE, SRE-swift-storage

Jan 4 2021

aaron added a comment to T270994: Re-deleting a Commons file: "Error deleting file: The file "mwstore://local-multiwrite/local-deleted/..." is in an inconsistent state within the internal storage backends"..

A subset of the log entries are bogus though (should be DEBUG, not ERROR).

Jan 4 2021, 11:58 PM · Structured Data Engineering, MediaWiki-File-management, Structured-Data-Backlog, Wikimedia-production-error, Commons, MW-1.36-notes (1.36.0-wmf.28; 2021-01-26), SRE, SRE-swift-storage
aaron added a comment to T250417: Deprecate and remove SquidPurgeClient classes.

See also: T264735

Jan 4 2021, 10:59 PM · MW-1.36-notes (1.36.0-wmf.25; 2021-01-05), MW-1.36-release, MW-1.35-notes (1.35.0-wmf.30; 2020-04-28), Technical-Debt (Deprecation process), Performance-Team, MediaWiki-Cache

Dec 14 2020

jijiki awarded T212129: Move MainStash out of Redis to a simpler multi-dc aware solution a Stroopwafel token.
Dec 14 2020, 1:13 PM · Performance-Team, Sustainability (MediaWiki-MultiDC), MediaWiki-General, serviceops-radar, User-mobrovac, User-jijiki, SRE

Dec 12 2020

aaron closed T250239: Make BagOStuff key encoding more consistent as Resolved.
Dec 12 2020, 2:17 AM · MediaWiki-Cache, Performance-Team

Dec 11 2020

aaron closed T261534: Strengthen the shared cache key logic in FileRepo classes as Resolved.
Dec 11 2020, 9:34 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Sustainability (Incident Followup), Patch-For-Review, Structured-Data-Backlog, Performance-Team (Radar), Structured Data Engineering, MediaWiki-File-management

Dec 10 2020

aaron created T269894: Include paratest runner with stock MediaWiki.
Dec 10 2020, 9:57 PM · MediaWiki-Core-Tests, Performance-Team

Dec 9 2020

aaron closed T265778: Make FileBackend code that scrapes PHP warning robust as Resolved.

Does this happen even if $wgShellLocale is set to the default of C.UTF8? We call setlocale() in Setup.php, and I think it should always be C.UTF8 or en_US.UTF8, there's not much justification for changing it.

Dec 9 2020, 11:02 PM · MW-1.35-notes, MW-1.36-notes (1.36.0-wmf.21; 2020-12-08), Patch-For-Review, Performance-Team, Commons, MediaWiki-File-management

Dec 7 2020

aaron closed T266502: Deprecate and remove wfMemcKey() as Resolved.
Dec 7 2020, 7:01 PM · MW-1.36-notes (1.36.0-wmf.20; 2020-12-01), Patch-For-Review, Technical-Debt (Deprecation process), Performance-Team, MediaWiki-Cache

Dec 3 2020

aaron added a comment to T269326: Create RequestTimeout library.

Related task: T269325

Dec 3 2020, 8:04 PM · Patch-For-Review, MW-1.36-notes (1.36.0-wmf.31; 2021-02-16), Platform Team Workboards (Clinic Duty Team)
aaron created T269325: Implement a reasonable strategy for handling Excimer-style timeouts in MediaWiki.
Dec 3 2020, 6:06 AM · Performance-Team (Radar), MediaWiki-General

Dec 1 2020

aaron created T269161: Disallow direct "BEGIN"/"COMMIT"/"ROLLBACK" via Database::query().
Dec 1 2020, 7:48 PM · Performance-Team, Wikimedia-Rdbms, Platform Engineering

Nov 25 2020

aaron updated the task description for T261534: Strengthen the shared cache key logic in FileRepo classes.
Nov 25 2020, 9:34 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Sustainability (Incident Followup), Patch-For-Review, Structured-Data-Backlog, Performance-Team (Radar), Structured Data Engineering, MediaWiki-File-management
aaron updated the task description for T261534: Strengthen the shared cache key logic in FileRepo classes.
Nov 25 2020, 9:34 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Sustainability (Incident Followup), Patch-For-Review, Structured-Data-Backlog, Performance-Team (Radar), Structured Data Engineering, MediaWiki-File-management
aaron added a comment to T193565: Foreign query for metawiki fails with "Table 'centralauth.page' doesn't exist" (DBConnRef mixup?).

I'm seeing timeout associated with these entries, e.g.:

Nov 25 2020, 1:42 AM · Platform Team Workboards (Clinic Duty Team), Sustainability (Incident Followup), Wikimedia-production-error, Wikimedia-Rdbms

Nov 20 2020

dancy awarded T267668: Some recent Commons uploads not available on other wikis (2020-11) a Orange Medal token.
Nov 20 2020, 12:33 AM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Patch-For-Review, Wikimedia-production-error, SRE, MediaWiki-File-management, Commons

Nov 18 2020

aaron closed T266927: redis.log spammed with "worthRefreshPopular(1604102439.6853, 60, 900, 1604104011.4142): p = 0.0012121212121212; refresh = N" as Declined.

Looked like a config error.

Nov 18 2020, 12:33 AM · MediaWiki-Cache, MediaWiki-General

Nov 17 2020

aaron added a comment to T267061: Save Timing regression starting around 2020-10-27.

This seems to have gotten better over the weeks. Not sure why.

Nov 17 2020, 8:01 PM · Performance-Team
aaron closed T264787: Make WANCache worthRefreshExpiring() account for values with FLD_TTL less than $lowTTL as Resolved.
Nov 17 2020, 5:55 PM · MW-1.36-notes (1.36.0-wmf.18; 2020-11-17), Performance-Team, MediaWiki-Cache

Nov 13 2020

aaron added a comment to T247028: Database 'INSERT' query rate doubled (module_deps regression?).

What is the state of this now? Are there any query graphs specific to this table?

Nov 13 2020, 11:50 PM · MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), Sustainability (Incident Followup), Performance Issue, Performance-Team, MediaWiki-ResourceLoader
aaron added a comment to T267455: Sunset/Archive Daddio skin.

No objections here.

Nov 13 2020, 6:47 AM · translatewiki.net, Wikimedia-GitHub, Diffusion-Repository-Administrators, Projects-Cleanup, Patch-For-Review, Technical-Debt (Deprecation process), Release-Engineering-Team, MediaWiki-skins-Daddio

Nov 6 2020

aaron added a comment to T252951: ResourceLoader DepStore lock acquired twice?.

I see plenty of timeouts where the error is not logged twice. Those that happen twice seem to be about 200 ms apart.

Nov 6 2020, 10:53 PM · Performance-Team, MediaWiki-ResourceLoader
aaron added a comment to T264787: Make WANCache worthRefreshExpiring() account for values with FLD_TTL less than $lowTTL.

It could be related if the problem has to do with regenerations (since it was intermitted). In any case, excessive regeneration is a problem in itself.

Nov 6 2020, 10:38 PM · MW-1.36-notes (1.36.0-wmf.18; 2020-11-17), Performance-Team, MediaWiki-Cache

Nov 5 2020

aaron added a comment to T252951: ResourceLoader DepStore lock acquired twice?.

Searching for << +channel:memcached +message:/.*deps.*/ >> I still see this sometimes.

Nov 5 2020, 9:10 PM · Performance-Team, MediaWiki-ResourceLoader

Nov 4 2020

aaron updated the task description for T261534: Strengthen the shared cache key logic in FileRepo classes.
Nov 4 2020, 1:02 AM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Sustainability (Incident Followup), Patch-For-Review, Structured-Data-Backlog, Performance-Team (Radar), Structured Data Engineering, MediaWiki-File-management
aaron added a comment to T264604: Enable "/*/mw-with-onhost-tier/" route for MediaWiki where safe.

@aaron is there a timeline as to when those patches will be merged?

Nov 4 2020, 12:08 AM · MW-1.36-notes, MW-1.37-notes (1.37.0-wmf.1; 2021-04-13), Patch-For-Review, User-jijiki, SRE, serviceops, Performance-Team

Oct 30 2020

aaron closed T250407: Deprecate wfForeignMemcKey() and BagOStuff::getKeyInternal() as Resolved.
Oct 30 2020, 9:41 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Technical-Debt (Deprecation process), MediaWiki-Cache, Performance-Team