Page MenuHomePhabricator

tstarling (Tim Starling)
UserAdministrator

Projects (20)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Oct 15 2014, 8:27 PM (380 w, 22 h)
Roles
Administrator
Availability
Available
LDAP User
Tim Starling
MediaWiki User
Tim Starling (WMF) [ Global Accounts ]

Recent Activity

Today

tstarling added a comment to T300194: Wikimedia\Rdbms\DBTransactionSizeError: Transaction spent 3.6s in writes, exceeding the 3s limit.

I reproduced this locally, my fix above seems to be sufficient. I'm pretty sure the intention of Aaron's change was to replace QUERY_IGNORE_DBO_TRX | QUERY_CHANGE_NONE with QUERY_CHANGE_LOCKS purely for brevity, with the same semantics applied.

Thu, Jan 27, 4:02 AM · MW-1.38-notes (1.38.0-wmf.19; 2022-01-24), Patch-For-Review, Platform Engineering, Commons, User-brennen, Wikimedia-production-error
tstarling added a comment to T299693: Memcached::cas(): Argument #4 ($expiration) must be of type int, int given.

People are begging for a release to fix this issue at https://github.com/php-memcached-dev/php-memcached/issues/496

Thu, Jan 27, 12:12 AM · Upstream, PHP 8.1 support, Performance-Team, MediaWiki-libs-ObjectCache

Yesterday

tstarling added a comment to T299693: Memcached::cas(): Argument #4 ($expiration) must be of type int, int given.

I was able to reproduce the bug and confirm that the commit I linked to fixes it.

Wed, Jan 26, 11:56 PM · Upstream, PHP 8.1 support, Performance-Team, MediaWiki-libs-ObjectCache

Tue, Jan 25

tstarling closed T299696: substr(): Passing null to parameter #1 ($string) of type string is deprecated as Resolved.

I think this task only refers to the specific case of update.php, so this is resolved. T289926 is for the audit.

Tue, Jan 25, 3:43 AM · MW-1.36-notes, MW-1.37-notes, MW-1.35-notes, MW-1.38-notes (1.38.0-wmf.19; 2022-01-24), PHP 8.1 support, MediaWiki-Installer
tstarling closed T292373: mediawiki/oauthclient-php does not support PHP8 as Resolved.
Tue, Jan 25, 2:54 AM · PHP 8.1 support, PHP 8.0 support, MediaWiki-extensions-OAuth
tstarling added a comment to T260735: Stop using is_resource().

For debug output, is_resource() is often appropriate, e.g.

Tue, Jan 25, 1:54 AM · MW-1.38-notes (1.38.0-wmf.20; 2022-01-31), wdwb-tech, Upstream, PHP 8.1 support, Wikidata, MediaWiki-extensions-WikibaseRepository, MediaWiki-extensions-Html2Wiki, MediaWiki-extensions-QuickGV, MediaWiki-General, PHP 8.0 support

Mon, Jan 24

tstarling added a comment to T299693: Memcached::cas(): Argument #4 ($expiration) must be of type int, int given.

There is this commit in the php-memcached repo which may be related: https://github.com/php-memcached-dev/php-memcached/commit/51c9baf49f96c5f35be8257549f426ef1860f0ef

Mon, Jan 24, 8:26 PM · Upstream, PHP 8.1 support, Performance-Team, MediaWiki-libs-ObjectCache
tstarling added a comment to T299693: Memcached::cas(): Argument #4 ($expiration) must be of type int, int given.

That's an odd error message, is that reproducible?

Mon, Jan 24, 7:47 PM · Upstream, PHP 8.1 support, Performance-Team, MediaWiki-libs-ObjectCache
tstarling closed T297665: MapCacheLRUTest::testHasInvalidKey fails on PHP 8.0.8 as Resolved.

This was fixed by https://gerrit.wikimedia.org/r/c/mediawiki/core/+/755846

Mon, Jan 24, 1:56 AM · Patch-For-Review, PHP 8.0 support, MediaWiki-Core-Tests

Sun, Jan 23

tstarling claimed T299798: PageImages no longer updating on stubs.
Sun, Jan 23, 11:51 PM · MW-1.38-notes (1.38.0-wmf.19; 2022-01-24), Patch-For-Review, Platform Team Workboards, Regression, PageImages

Fri, Jan 21

tstarling added a comment to T283275: Make MW master tests pass on PHP 8.0.

Composer works now, following gerrit 748168. All core tests pass for me after my changes above and the one linked to T268847.

Fri, Jan 21, 4:29 AM · MW-1.37-notes, MW-1.35-notes, MW-1.36-notes, MW-1.38-notes (1.38.0-wmf.19; 2022-01-24), Patch-For-Review, PHP 8.0 support, MediaWiki-General
tstarling added a comment to T268847: PHP 8: libxml_disable_entity_loader() is deprecated.
  1. Implement a way to restore the loader, per the upstream bug.
Fri, Jan 21, 2:19 AM · MW-1.36-notes, MW-1.37-notes, MW-1.38-notes (1.38.0-wmf.19; 2022-01-24), MW-1.35-notes, MediaWiki-General, PHP 8.0 support

Thu, Jan 20

tstarling added a comment to T292322: Support large files in Shellbox.

But given in reality I was proposing to do something like:

signature = md5sum( secret + padding + request_body)

the attacker, in order to trick shellbox, would not need to match "signature", but rather to figure out "secret", which isn't how md5 has been broken. So I think it would be an actually valid way to assess the integrity of the request. Let me also note that depending on the length of the secret key, it is probably possible to brute-force guess it, given enough computational power, and yes using a faster hashing algorithm will reduce the computational cost of doing so.

Thu, Jan 20, 10:33 PM · MW-1.38-notes (1.38.0-wmf.18; 2022-01-17), Patch-For-Review, SRE-swift-storage, Shellbox, serviceops, MW-on-K8s
tstarling added a comment to T296610: MediumSpecificBagOStuff->guessSerialValueSize infinite loop when storing Title object (Special:Homepage throws "Maximum function nesting reached").

Was any benchmarking done to support the introduction of this feature? There was no linked bug on https://gerrit.wikimedia.org/r/c/mediawiki/core/+/580607 . I'm having trouble trying to find a test case for which it is faster than json_encode().

Thu, Jan 20, 12:40 AM · MW-1.38-notes (1.38.0-wmf.20; 2022-01-31), Patch-For-Review, Performance-Team, GrowthExperiments-NewcomerTasks, MediaWiki-libs-ObjectCache, Growth-Team

Wed, Jan 19

tstarling added a comment to T249985: Lithuanian Category Collation: Articles starting with y grouped together with articles starting with i, but those are two different letters.

If it's a bug, then the bug is in the ICU sort order not in MediaWiki's first letter identification.

Wed, Jan 19, 3:44 AM · MediaWiki-Categories, Wikimedia-Site-requests
tstarling added a comment to T292322: Support large files in Shellbox.

There is openssl_digest() which presumably has hardware acceleration and can do SHA-256 in 2.1 seconds per gigabyte. But its input is a single string, not a stream, so it can't quite work as a drop-in replacement. At least it gives you some idea of what an optimisation patch for PHP could do.

Wed, Jan 19, 2:15 AM · MW-1.38-notes (1.38.0-wmf.18; 2022-01-17), Patch-For-Review, SRE-swift-storage, Shellbox, serviceops, MW-on-K8s

Tue, Jan 18

tstarling added a comment to T292322: Support large files in Shellbox.

52 seconds in Shellbox\Client::computeHmac over 3 calls, I guess all signatures for the remote shellbox calls

Tue, Jan 18, 2:13 AM · MW-1.38-notes (1.38.0-wmf.18; 2022-01-17), Patch-For-Review, SRE-swift-storage, Shellbox, serviceops, MW-on-K8s

Mon, Jan 17

tstarling added a comment to T293958: 1.38.0-wmf.17 deployment blockers.

We've text-messaged @tstarling but it's saturday morning where he lives so I can't be sure when/if Tim will respond here on the weekend.

Mon, Jan 17, 3:36 AM · Patch-For-Review, Release-Engineering-Team (Next), Release, Train Deployments
tstarling added a comment to T296895: LinksUpdate hook review.

As I commented on T299149#7620494, I'm pretty unhappy we added a ParserModifyImageHTML hook here:

  1. Arbitrary modification of HTML will certainly cause havoc with VisualEditor. I don't think we should be adding new hooks which don't play well with the near-future use of Parsoid for read view HTML. Mutating the HTML to add a comment is a change which is certainly visible to Visual Editor as well as Parsoid during html2wt, as comments are DOM nodes and not stripped in the Parsoid world.
Mon, Jan 17, 3:22 AM · MW-1.38-notes (1.38.0-wmf.18; 2022-01-17), Patch-For-Review, Growth-Team, Platform Team Workboards (MW Expedition), PageImages, Notifications, MediaWiki-extensions-Translate, MediaWiki-Core-Hooks

Thu, Jan 13

tstarling added a comment to T299149: MWException: Parser state cleared while parsing. Did you call Parser::parse recursively?.

It would have been complicated to set up CommonsMetadata locally in order to reproduce the bug, so I modelled the bug with a patch to FormatMetadata:

Thu, Jan 13, 10:45 PM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), PageImages, MediaWiki-Parser, Wikimedia-production-error
tstarling added a comment to T299149: MWException: Parser state cleared while parsing. Did you call Parser::parse recursively?.

@tstarling can investigate this better but the PageImages patch may potentially need a revert to figure out the right strategy here since the metadata fetch is requiring a (recurisve) wikitext parse compared to before where the links-update hook invocation wasn't part of the main page parse. /cc @Krinkle

Thu, Jan 13, 10:13 PM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), PageImages, MediaWiki-Parser, Wikimedia-production-error
tstarling closed T299095: Links tables corrupted due to incorrectly parenthesized delete queries as Resolved.
Thu, Jan 13, 9:30 PM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Wikimedia-Incident, Patch-For-Review, Platform Engineering, Wikimedia-production-error
tstarling added a comment to T299095: Links tables corrupted due to incorrectly parenthesized delete queries.

How did you find the offending query? In the slow query log?

Thu, Jan 13, 12:57 PM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Wikimedia-Incident, Patch-For-Review, Platform Engineering, Wikimedia-production-error
tstarling added a comment to T299095: Links tables corrupted due to incorrectly parenthesized delete queries.

In s2 and s3 I used

Thu, Jan 13, 5:31 AM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Wikimedia-Incident, Patch-For-Review, Platform Engineering, Wikimedia-production-error
tstarling added a comment to T299095: Links tables corrupted due to incorrectly parenthesized delete queries.
  • In s3, the affected wikis were elwiktionary eowiktionary hiwiki incubatorwiki itwikiquote ruwiktionary simplewiki specieswiki trwiktionary. The total restored row count is 226335.
Thu, Jan 13, 4:59 AM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Wikimedia-Incident, Patch-For-Review, Platform Engineering, Wikimedia-production-error
tstarling edited P18710 undelete-pagelinks.pl.
Thu, Jan 13, 4:42 AM
tstarling added a comment to T299095: Links tables corrupted due to incorrectly parenthesized delete queries.
  • In s2, only itwiki is affected. I undeleted 11565 rows.
Thu, Jan 13, 4:41 AM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Wikimedia-Incident, Patch-For-Review, Platform Engineering, Wikimedia-production-error
tstarling added a comment to T299095: Links tables corrupted due to incorrectly parenthesized delete queries.
  • Using the statement-based binlog, I confirmed that there were no affected deletes on commonswiki.iwlinks.
  • I confirmed that there were no affected deletes on commonswiki.templatelinks. Templatelinks is rarely affected because the bug is triggered when multiple target namespaces appear in a delete query.
  • On wikidatawiki there was only one bad query, and it deleted 1.7M rows linking to [[P156]]. I manually trimmed the unrelated rows from the undelete SQL and started the import.
Thu, Jan 13, 3:33 AM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Wikimedia-Incident, Patch-For-Review, Platform Engineering, Wikimedia-production-error
tstarling edited P18710 undelete-pagelinks.pl.
Thu, Jan 13, 3:19 AM
tstarling added a comment to T299095: Links tables corrupted due to incorrectly parenthesized delete queries.

I used the statement-based binlog to determine that for commonswiki pagelinks, the first affected query was at 20:20:40 and the last was at 20:33:23. So I'll narrow the range for undeletion accordingly.

Thu, Jan 13, 2:17 AM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Wikimedia-Incident, Patch-For-Review, Platform Engineering, Wikimedia-production-error
tstarling added a comment to T299095: Links tables corrupted due to incorrectly parenthesized delete queries.

The plan is to undelete pagelinks on commonswiki which were deleted between 20:19:00 and 20:40:00 using the pasted perl script and sql.php with the batch size patch. The sooner the better, since the undeleted link data becomes more stale as time goes by. I will go ahead as soon as someone reviews the concept and voices approval.

Thu, Jan 13, 1:55 AM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Wikimedia-Incident, Patch-For-Review, Platform Engineering, Wikimedia-production-error
tstarling added a comment to T299095: Links tables corrupted due to incorrectly parenthesized delete queries.

I wrote this perl script: P18710. It makes a 500MB SQL file from the commonswiki pagelinks deletions.

Thu, Jan 13, 1:27 AM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Wikimedia-Incident, Patch-For-Review, Platform Engineering, Wikimedia-production-error
tstarling created P18710 undelete-pagelinks.pl.
Thu, Jan 13, 1:26 AM

Wed, Jan 12

tstarling added a comment to T299095: Links tables corrupted due to incorrectly parenthesized delete queries.
DELETE FROM `templatelinks` WHERE (tl_from = 9691118 AND (tl_namespace = 10 AND tl_title IN ('LangSwitch','Purge') ) OR (tl_namespace = 828 AND tl_title = 'LangSwitch'))
Wed, Jan 12, 11:53 PM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Wikimedia-Incident, Patch-For-Review, Platform Engineering, Wikimedia-production-error
tstarling added a comment to T299095: Links tables corrupted due to incorrectly parenthesized delete queries.

I mean the insert traffic will be there already, since the old version will slowly repair deleted links. So we should throttle the jobs now if they are a problem.

Wed, Jan 12, 11:49 PM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Wikimedia-Incident, Patch-For-Review, Platform Engineering, Wikimedia-production-error
tstarling renamed T299095: Links tables corrupted due to incorrectly parenthesized delete queries from Wikimedia\Rdbms\DBReadOnlyError: Database is read-only: The database is read-only until replication lag decreases. to Links tables corrupted due to incorrectly parenthesized delete queries.
Wed, Jan 12, 11:42 PM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Wikimedia-Incident, Patch-For-Review, Platform Engineering, Wikimedia-production-error
tstarling added a comment to T299095: Links tables corrupted due to incorrectly parenthesized delete queries.

There will be additional insert traffic as refreshlinks jobs reinsert the deleted links.

Wed, Jan 12, 11:40 PM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Wikimedia-Incident, Patch-For-Review, Platform Engineering, Wikimedia-production-error
tstarling added a comment to T299095: Links tables corrupted due to incorrectly parenthesized delete queries.
DELETE FROM `templatelinks` WHERE (tl_from = 9691118 AND (tl_namespace = 10 AND tl_title IN ('LangSwitch','Purge') ) OR (tl_namespace = 828 AND tl_title = 'LangSwitch'))
Wed, Jan 12, 11:07 PM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Wikimedia-Incident, Patch-For-Review, Platform Engineering, Wikimedia-production-error
tstarling added a comment to T240775: RFC: Support PHP 7.4 preload.

I explored the problem using the patch above. My wiki had 38 extensions enabled and generated a preload file with 2833 files. I implemented both a maintenance script and a dynamic approach. I used PHP 8.0.

Wed, Jan 12, 5:47 AM · MW-1.38-notes (1.38.0-wmf.20; 2022-01-31), Patch-For-Review, MediaWiki-General, MW-1.36-notes (1.36.0-wmf.35; 2021-03-16), Performance-Team, TechCom-RFC

Tue, Jan 11

tstarling added a comment to T298930: PageImages is causing core parser test failures in other extensions, e.g. GrowthExperiments.

I don't think it's a parser test issue. Those comments are leaking into the page indicators and need to be stripped out.

Tue, Jan 11, 4:25 AM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Growth-Team, ci-test-error (WMF-deployed Build Failure), GrowthExperiments, PageImages
tstarling added a comment to T240775: RFC: Support PHP 7.4 preload.

Never mind, it doesn't work that way. All data from the preload script is discarded. Only functions and classes are preserved.

Tue, Jan 11, 12:22 AM · MW-1.38-notes (1.38.0-wmf.20; 2022-01-31), Patch-For-Review, MediaWiki-General, MW-1.36-notes (1.36.0-wmf.35; 2021-03-16), Performance-Team, TechCom-RFC

Mon, Jan 10

tstarling added a comment to T240775: RFC: Support PHP 7.4 preload.

The library can provide an include wrapper which tries to return the array from a preloaded cache, falling back to actually including the file.

Mon, Jan 10, 6:52 AM · MW-1.38-notes (1.38.0-wmf.20; 2022-01-31), Patch-For-Review, MediaWiki-General, MW-1.36-notes (1.36.0-wmf.35; 2021-03-16), Performance-Team, TechCom-RFC
tstarling added a comment to T240775: RFC: Support PHP 7.4 preload.

I have an idea:

Mon, Jan 10, 6:06 AM · MW-1.38-notes (1.38.0-wmf.20; 2022-01-31), Patch-For-Review, MediaWiki-General, MW-1.36-notes (1.36.0-wmf.35; 2021-03-16), Performance-Team, TechCom-RFC

Fri, Jan 7

tstarling added a comment to T293546: Increase $wgMaxTemplateDepth (template expansion depth) on Wikispecies.

Just make it 100 everywhere.

Fri, Jan 7, 11:57 PM · MW-1.38-notes (1.38.0-wmf.19; 2022-01-24), Performance-Team, Wikimedia-Site-requests
tstarling closed T298047: Performance Team onboarding for tstarling as Resolved.
Fri, Jan 7, 5:38 AM · Performance-Team
tstarling updated the task description for T298047: Performance Team onboarding for tstarling.
Fri, Jan 7, 5:38 AM · Performance-Team
tstarling closed T298659: BadMethodCallException: Sessions are disabled for load entry point as Resolved.

The revert is merged in master. Any remaining work is part of T161976.

Fri, Jan 7, 4:50 AM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Growth-Team, MediaWiki-Recent-changes, MediaWiki-extensions-Scribunto, Wikimedia-production-error
tstarling closed T298659: BadMethodCallException: Sessions are disabled for load entry point, a subtask of T161976: Feature request: add detection for page language to Scribunto, as Resolved.
Fri, Jan 7, 4:50 AM · MW-1.38-notes (1.38.0-wmf.20; 2022-01-31), Patch-For-Review, MediaWiki-extensions-Scribunto
tstarling closed T298659: BadMethodCallException: Sessions are disabled for load entry point, a subtask of T293958: 1.38.0-wmf.17 deployment blockers, as Resolved.
Fri, Jan 7, 4:50 AM · Patch-For-Review, Release-Engineering-Team (Next), Release, Train Deployments
tstarling added a comment to T292322: Support large files in Shellbox.

Is the procedure the one documented at https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments ?

Fri, Jan 7, 2:39 AM · MW-1.38-notes (1.38.0-wmf.18; 2022-01-17), Patch-For-Review, SRE-swift-storage, Shellbox, serviceops, MW-on-K8s

Thu, Jan 6

tstarling added a comment to T298659: BadMethodCallException: Sessions are disabled for load entry point.

It can be reverted in master, the feature request was not urgent.

Thu, Jan 6, 9:56 PM · MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), Growth-Team, MediaWiki-Recent-changes, MediaWiki-extensions-Scribunto, Wikimedia-production-error
tstarling added a comment to T298225: Database query error occurs when visiting Special:RecentChanges&tagfilter=wikieditor.

I think estimateRowCount() can be used to choose decide whether to set the STRAIGHT_JOIN option. I tried a few EXPLAIN SELECT queries and they're only out by a factor of 2 or so, which should be good enough.

Thu, Jan 6, 3:56 AM · Verified, Performance-Team (Radar), MW-1.37-release, MW-1.36-release, MW-1.35-release, SecTeam-Processed, MediaWiki-Recent-changes, Growth-Team, Performance-Team-publish, MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), DBA, Vuln-DoS, Security-Team, Security, Wikimedia-production-error, Editing-team (FY2021-22 Kanban Board)
tstarling added a comment to T298225: Database query error occurs when visiting Special:RecentChanges&tagfilter=wikieditor.

It's just a query planning error. If you force the join order then it's fast.

Thu, Jan 6, 2:23 AM · Verified, Performance-Team (Radar), MW-1.37-release, MW-1.36-release, MW-1.35-release, SecTeam-Processed, MediaWiki-Recent-changes, Growth-Team, Performance-Team-publish, MW-1.38-notes (1.38.0-wmf.17; 2022-01-10), DBA, Vuln-DoS, Security-Team, Security, Wikimedia-production-error, Editing-team (FY2021-22 Kanban Board)

Wed, Jan 5

tstarling added a comment to T292322: Support large files in Shellbox.

I was not able to reproduce that error.

Wed, Jan 5, 11:24 PM · MW-1.38-notes (1.38.0-wmf.18; 2022-01-17), Patch-For-Review, SRE-swift-storage, Shellbox, serviceops, MW-on-K8s

Tue, Jan 4

tstarling added a comment to T297517: wtp* hosts: Out of memory (allocated 39845888) (tried to allocate 131072 bytes) in OutputHandler.php.

I filed T298573 for the kernel tuning issue.

Tue, Jan 4, 11:34 PM · User-Ladsgroup, SRE, serviceops, Wikimedia-production-error
tstarling created T298573: System OOM causes random mmap() failure rather than oom-killer.
Tue, Jan 4, 11:33 PM · serviceops

Dec 16 2021

tstarling added a comment to T297667: mysqli/mysqlnd memory leak.

Yeah, it turns out segfaulting once every couple of hours keeps a lid on memory usage. I did a linear regression of the data from 2021-12-16 04:00 to 21:50. mw1414 is leaking 122 MB/hour, and mw1415 is leaking 600MB/hour. Most likely there is a second smaller memory leak, and the task as described is resolved. I would suggest rolling out the patch. This is my last day before vacation. I can do another core dump analysis in the new year.

Dec 16 2021, 10:51 PM · serviceops-radar, WMF-General-or-Unknown

Dec 15 2021

tstarling added a comment to T297667: mysqli/mysqlnd memory leak.

I collected 37 segfault backtraces with systemd-coredump. All indicated that the problem is the lack of Dmitry's followup commit 5f47ce9127ec9123ba2359bb05cf180c8a4177b6. So I added that to my mysqlnd-leak-7.2-backport branch as 5f5ec42235cc15eacd86abe55ebb33bfb677a2ce. I'll do some more local testing but hopefully it will work now.

Dec 15 2021, 10:57 PM · serviceops-radar, WMF-General-or-Unknown
tstarling added a comment to T297667: mysqli/mysqlnd memory leak.

There are 20-30 segfaults per hour, so there's probably a bug with my backport. I'll see if I can figure it out. For now, don't upgrade the rest of the cluster.

Dec 15 2021, 10:05 PM · serviceops-radar, WMF-General-or-Unknown
tstarling added a comment to T297667: mysqli/mysqlnd memory leak.

PHP 7.2 backport: https://github.com/tstarling/php-src/commit/147b5009178cf12cc720b3e19c5ad0be713c2c33

Dec 15 2021, 3:47 AM · serviceops-radar, WMF-General-or-Unknown
tstarling added a subtask for T297667: mysqli/mysqlnd memory leak: T271736: Migrate WMF Production from PHP 7.2 to PHP 7.4.
Dec 15 2021, 2:30 AM · serviceops-radar, WMF-General-or-Unknown
tstarling added a parent task for T271736: Migrate WMF Production from PHP 7.2 to PHP 7.4: T297667: mysqli/mysqlnd memory leak.
Dec 15 2021, 2:30 AM · Performance-Team (Radar), serviceops
tstarling added a comment to T297667: mysqli/mysqlnd memory leak.

The fix was a7305eb539596e175bd6c3ae9a20953358c5d677, which is in PHP 7.3, which caused mysqlnd to use the request-local allocator for result data even for persistent connections.

Dec 15 2021, 2:22 AM · serviceops-radar, WMF-General-or-Unknown
tstarling added a comment to T297667: mysqli/mysqlnd memory leak.

Reduced test case:

Dec 15 2021, 1:41 AM · serviceops-radar, WMF-General-or-Unknown

Dec 14 2021

tstarling added a comment to T297667: mysqli/mysqlnd memory leak.

I think the things being leaked are zend_string objects allocated by mysqlnd_wireprotocol.c line 1380. In the core dump they have a reference count of 1 but no reference to them exists:

Dec 14 2021, 4:16 AM · serviceops-radar, WMF-General-or-Unknown
tstarling added a comment to T297667: mysqli/mysqlnd memory leak.

As discussed at T296098, mw1414 is serving no requests but has high memory usage. Analysis of /proc/<pid>/maps shows that most of the memory usage is in the sbrk() segment which is only used by malloc(), not emalloc(), opcache or APCu. Dumping the segment from a random process and looking at random parts of it showed field name strings:

Dec 14 2021, 3:08 AM · serviceops-radar, WMF-General-or-Unknown
tstarling added a comment to T296098: 1.38.0-wmf.9 seems to have introduced a memory leak.

I'm moving my work on the root cause to T297667: mysqli/mysqlnd memory leak

Dec 14 2021, 2:28 AM · User-Ladsgroup, Release-Engineering-Team (Done by Wed 24 Nov 🧟), Release, Train Deployments
tstarling added a comment to T297517: wtp* hosts: Out of memory (allocated 39845888) (tried to allocate 131072 bytes) in OutputHandler.php.

I filed T297667 for the PHP bug which I'm working on.

Dec 14 2021, 2:26 AM · User-Ladsgroup, SRE, serviceops, Wikimedia-production-error
tstarling created T297667: mysqli/mysqlnd memory leak.
Dec 14 2021, 2:25 AM · serviceops-radar, WMF-General-or-Unknown
tstarling added a comment to T297517: wtp* hosts: Out of memory (allocated 39845888) (tried to allocate 131072 bytes) in OutputHandler.php.

The only thing unique to this report as compared to T296098 and T296063 is the failure mode, i.e. mmap() failure, which as I've said should be fixed by tuning the kernel. Is tuning the kernel the thing that you want unbroken now? Again, it has probably been broken for years.

Dec 14 2021, 12:27 AM · User-Ladsgroup, SRE, serviceops, Wikimedia-production-error
tstarling added a comment to T297517: wtp* hosts: Out of memory (allocated 39845888) (tried to allocate 131072 bytes) in OutputHandler.php.

We're currently on 1.38.0-wmf.9, and this remains a blocker to rolling forward both wmf.12 and this week's wmf.13. Re-raising to UBN!.

Dec 14 2021, 12:24 AM · User-Ladsgroup, SRE, serviceops, Wikimedia-production-error
tstarling added a comment to T297416: Restrict access to most actions on $wgWhitelistRead pages on private wikis.

I don't think it's necessary to deprecate $wgWhitelistRead in order to prevent unauthorized users from executing arbitrary actions. We can just have actions opt in to being allowed in whitelist read mode, the same as we do for API modules with isReadMode():

Dec 14 2021, 12:15 AM · MW-1.37-notes, MW-1.36-notes, MW-1.38-notes (1.38.0-wmf.16; 2022-01-03), MW-1.35-notes, MediaWiki-General, SecTeam-Processed, Sustainability (Incident Followup), User-Ladsgroup, Wikimedia-Site-requests, Security-Team, Security

Dec 13 2021

tstarling added a comment to T296098: 1.38.0-wmf.9 seems to have introduced a memory leak.

I dumped some random parts of the heap of a php-fpm7.2 process on mw1414. It looks like DB query results. Probably mysqli is leaking query results, hence when the query rate increases, the leak rate increases.

Dec 13 2021, 10:44 PM · User-Ladsgroup, Release-Engineering-Team (Done by Wed 24 Nov 🧟), Release, Train Deployments
tstarling added a comment to T296098: 1.38.0-wmf.9 seems to have introduced a memory leak.

OK, I didn't realise mw1414 was depooled with high memory usage, that is useful. I looked at /proc/<pid>/maps. The heap segments, typically at 0x563908311000, account for 36GB of memory usage if you add them all together, out of 41GB total "used" memory. So I think it's probably coming from malloc(), not APC/APCu or emalloc().

Dec 13 2021, 7:40 AM · User-Ladsgroup, Release-Engineering-Team (Done by Wed 24 Nov 🧟), Release, Train Deployments
tstarling lowered the priority of T297517: wtp* hosts: Out of memory (allocated 39845888) (tried to allocate 131072 bytes) in OutputHandler.php from Unbreak Now! to High.

So the wtp* servers were indeed out of memory, as reported at T296098. There were other consequences, for example:

Dec 13 2021, 1:04 AM · User-Ladsgroup, SRE, serviceops, Wikimedia-production-error
tstarling added a comment to T296098: 1.38.0-wmf.9 seems to have introduced a memory leak.

It tried to get a core dump:

Dec 13 2021, 12:32 AM · User-Ladsgroup, Release-Engineering-Team (Done by Wed 24 Nov 🧟), Release, Train Deployments

Dec 12 2021

tstarling added a comment to T296098: 1.38.0-wmf.9 seems to have introduced a memory leak.

It sounds like nobody has a theory as to how an increased query rate could lead to increased memory. It would have been nice if someone could have got a core dump of an affected process.

Dec 12 2021, 10:37 PM · User-Ladsgroup, Release-Engineering-Team (Done by Wed 24 Nov 🧟), Release, Train Deployments
tstarling added a comment to T297517: wtp* hosts: Out of memory (allocated 39845888) (tried to allocate 131072 bytes) in OutputHandler.php.

The message probably indicates that mmap() returned NULL when PHP tried to allocate memory. I don't know why that would happen. You would expect oom-killer to be invoked if the system is out of memory. The message is probably occurs randomly on a stressed system, rather than on a process that is using a lot of memory.

Dec 12 2021, 9:45 AM · User-Ladsgroup, SRE, serviceops, Wikimedia-production-error

Dec 2 2021

tstarling added a comment to T296895: LinksUpdate hook review.

@kostajh I am just asking that Growth-Team consider migrating Echo from LinksUpdateAfterInsert to LinksUpdateComplete, as a small tech debt project. Then I can deprecate LinksUpdateAfterInsert.

Dec 2 2021, 10:59 PM · MW-1.38-notes (1.38.0-wmf.18; 2022-01-17), Patch-For-Review, Growth-Team, Platform Team Workboards (MW Expedition), PageImages, Notifications, MediaWiki-extensions-Translate, MediaWiki-Core-Hooks
tstarling added a comment to T176520: Pageimage property (and possibly other page properties) not updated reliably after reverts.

PageImages loads the text of the page from the database during LinksUpdate. Presumably this bug would be fixed if it stopped doing that, as I proposed in T296895.

Dec 2 2021, 12:47 AM · MW-1.38-notes (1.38.0-wmf.16; 2022-01-03), Sustainability (Incident Followup), Wikimedia-database-error, PageImages
tstarling created T296895: LinksUpdate hook review.
Dec 2 2021, 12:11 AM · MW-1.38-notes (1.38.0-wmf.18; 2022-01-17), Patch-For-Review, Growth-Team, Platform Team Workboards (MW Expedition), PageImages, Notifications, MediaWiki-extensions-Translate, MediaWiki-Core-Hooks

Nov 25 2021

tstarling added a comment to T271728: Migration strategy from DOMDocument to Dodo.

Per my comment at T269459#7522285, Dodo has poor performance compared to the PHP DOM extension. I don't think it's fixable without changing some of the design goals. The main problem is increased memory usage, especially an increased number of countable objects, leading to substantial GC overhead. I don't think it is feasible to store the Parsoid DOM using pure PHP linked lists. My recommendation is for Parsoid to go back to using PHP DOM extension directly, and to instead consider a W3C-compliant layer wrapping DOM data structures for the benefit of extensions, if that is desired.

Nov 25 2021, 12:51 AM · Parsoid (Dodo)

Nov 24 2021

tstarling lowered the priority of T222856: WikiPEG & Parsoid cache rule optimization from Medium to Lowest.
Nov 24 2021, 12:09 AM · WikiPEG, Patch-For-Review
tstarling placed T237618: Amendments to the Gerrit Privilege policy up for grabs.

I am not working on this right now.

Nov 24 2021, 12:07 AM · WMF-General-or-Unknown, TechCom
tstarling added a comment to T254210: ParameterAssertionException "Bad value for parameter $row->rev_timestamp" from RevisionStoreRecord.php.

Any weird data corruption issue which occurs when handling a timeout should be closed as a duplicate of T293568. It's unclear to me whether this bug qualifies since the original error was not inside the exception handler. But T254210#7164506 would certainly qualify.

Nov 24 2021, 12:04 AM · Wikimedia-Timestamp, Platform Team Workboards (Clinic Duty Team), MediaWiki-Revision-backend, Wikimedia-production-error

Nov 23 2021

tstarling merged Restricted Task into T293568: PHP Notice: Undefined offset in wikimedia/remex-html when rendering rest.php error page.
Nov 23 2021, 11:59 PM · Patch-For-Review, Upstream, Excimer, Parsoid, Wikimedia-production-error
tstarling closed T267530: Shellbox command validation, a subtask of T260330: RFC: PHP microservice for containerized shell execution, as Resolved.
Nov 23 2021, 11:50 PM · MW-1.36-notes (1.36.0-wmf.30; 2021-02-09), Shellbox, TechCom-RFC (TechCom-RFC-Closed), Platform Team Workboards (Purple), MW-on-K8s, Patch-For-Review, serviceops, SRE
tstarling closed T267530: Shellbox command validation as Resolved.
Nov 23 2021, 11:50 PM · Shellbox, MW-on-K8s, Platform Team Workboards (Purple)
tstarling closed T278917: Clean up obsolete ActorMigration usages for non-temp tables, a subtask of T161671: Compacting the revision table, as Resolved.
Nov 23 2021, 11:49 PM · Platform Team Workboards (Epics), Platform Team Initiatives (Revision Storage Schema Improvements), MediaWiki-Revision-backend, Multi-Content-Revisions, Epic, Patch-For-Review, Schema-change
tstarling closed T278917: Clean up obsolete ActorMigration usages for non-temp tables, a subtask of T275246: Populate rev_actor and rev_comment_id, as Resolved.
Nov 23 2021, 11:49 PM · MW-1.38-notes (1.38.0-wmf.13; 2021-12-13), MW-1.38-release, Patch-For-Review, Platform Engineering Roadmap, Code-Health-Objective, Platform Team Initiatives (Revision Storage Schema Improvements), Technical-Debt
tstarling closed T278917: Clean up obsolete ActorMigration usages for non-temp tables as Resolved.
Nov 23 2021, 11:49 PM · MW-1.37-notes (1.37.0-wmf.5; 2021-05-11), Platform Team Workboards (MW Expedition), MW-1.36-notes (1.36.0-wmf.38; 2021-04-06), Patch-For-Review

Nov 22 2021

tstarling updated subscribers of T269459: Run performance tests of the new DOM library.

I benchmarked Dodo integrated with Parsoid while parsing a realistic test case ([[Australia]] with local templates and no images), PHP 8.0.12 stock (no DOM patch).

Nov 22 2021, 11:37 PM · Parsoid (Dodo)

Nov 17 2021

tstarling added a comment to T275246: Populate rev_actor and rev_comment_id.

I updated the checklist. The next step is to do the migration in production. Wikidata reads revision rows from foreign databases, which implies that the configuration for all wikis should be kept in sync. So:

Nov 17 2021, 11:17 PM · MW-1.38-notes (1.38.0-wmf.13; 2021-12-13), MW-1.38-release, Patch-For-Review, Platform Engineering Roadmap, Code-Health-Objective, Platform Team Initiatives (Revision Storage Schema Improvements), Technical-Debt
tstarling updated the task description for T275246: Populate rev_actor and rev_comment_id.
Nov 17 2021, 10:46 PM · MW-1.38-notes (1.38.0-wmf.13; 2021-12-13), MW-1.38-release, Patch-For-Review, Platform Engineering Roadmap, Code-Health-Objective, Platform Team Initiatives (Revision Storage Schema Improvements), Technical-Debt

Nov 12 2021

tstarling updated subscribers of T293568: PHP Notice: Undefined offset in wikimedia/remex-html when rendering rest.php error page.

So it turns out that Dmitry and Nikic were working on the same problem simultaneously, and a fix was just committed. I confirmed that https://github.com/php/php-src/commit/fa0b84a06b03a1c2a2bcadd647232a8a4a90aa05 can be cleanly applied to PHP 7.4, and it fixes the reduced test case. After applying the fix, I was able to send 600 timeout exceptions to Parsoid without it segfaulting or aborting, when compiled in debug mode with AddressSanitizer. Before the fix, it segfaulted after handling 5 timeouts.

Nov 12 2021, 2:49 AM · Patch-For-Review, Upstream, Excimer, Parsoid, Wikimedia-production-error

Nov 11 2021

tstarling added a comment to T293568: PHP Notice: Undefined offset in wikimedia/remex-html when rendering rest.php error page.

Now at https://bugs.php.net/bug.php?id=81610 and https://github.com/php/php-src/pull/7642

Nov 11 2021, 7:08 AM · Patch-For-Review, Upstream, Excimer, Parsoid, Wikimedia-production-error

Nov 10 2021

tstarling added a comment to T293568: PHP Notice: Undefined offset in wikimedia/remex-html when rendering rest.php error page.

I tried moving the opline assignment in ZEND_VM_JMP() to after the interrupt check, but in e.g. ZEND_JMPNZ, the opline has already been incremented by the time ZEND_VM_JMP() is called. It would be necessary to change how conditional jumps work somewhat.

Nov 10 2021, 4:13 AM · Patch-For-Review, Upstream, Excimer, Parsoid, Wikimedia-production-error

Nov 9 2021

tstarling added a comment to T293568: PHP Notice: Undefined offset in wikimedia/remex-html when rendering rest.php error page.

Minimal test case:

Nov 9 2021, 12:46 AM · Patch-For-Review, Upstream, Excimer, Parsoid, Wikimedia-production-error

Nov 8 2021

tstarling added projects to T293568: PHP Notice: Undefined offset in wikimedia/remex-html when rendering rest.php error page: Excimer, Upstream.
Nov 8 2021, 8:39 PM · Patch-For-Review, Upstream, Excimer, Parsoid, Wikimedia-production-error
tstarling added a comment to T293568: PHP Notice: Undefined offset in wikimedia/remex-html when rendering rest.php error page.

That doesn't work because it goes into an infinite tail call loop

Nov 8 2021, 6:07 AM · Patch-For-Review, Upstream, Excimer, Parsoid, Wikimedia-production-error