The drop can also be seen in the ApiOptions p99 request service time, which dropped from ~25 seconds to ~5 seconds. The median service time is about 700ms, and for pretty much all of that time, the user row is locked, since User::getInstanceForUpdate() is the first thing the module does, and User::saveSettings() is the last thing it does before output.
Tue, Apr 20
It's interesting to see that there are so many JS callers of mw.Api().saveOption(). It reminds me of T128602. I think preferences could be redesigned around this use case -- it was originally imagined as mostly being set in bulk via Special:Preferences, rather than being a generic user-linked data store.
That concludes the code search for WMF deployed extensions. Any other extensions using the old ActorMigration fields will emit deprecation warnings after https://gerrit.wikimedia.org/r/c/mediawiki/core/+/676179 is merged.
Mon, Apr 19
According to https://www.mediawiki.org/wiki/Extension:Translate , the Translate extension has a "master" b/c policy going back to 1.33. But its extension.json says it requires 1.34+. Can @Nikerabbit please confirm that it is safe to drop ActorMigration calls in the Translate extension that are there to support MW 1.33 and earlier?
Sun, Apr 18
Fri, Apr 16
We could just remove the exception. Log a warning instead.
Tue, Apr 13
Mon, Apr 12
I would suggest temporarily removing ipblocks from $wgSharedTables during the upgrade. Re-add it once all wikis are upgraded.
I am seeing the same error migrating ipblocks, probably due to $wgSharedTables having ipblocks in it. I think the solution will be to unshare the tables.
Fri, Apr 9
The URLs were changed to point to Commons, but transformVia404 has no effect since it is implemented in the parent File::transform() which is not called. Also, there is a bug in the cache expiry code, meaning the thumbnail is downloaded every time instead of once per month. So the full thumbnail is downloaded and stored every time a page containing a commons image is rendered.
Thu, Apr 8
Wed, Apr 7
I grepped the Ruffle source and found no implementation of cross-domain policy. I found a couple of complaints in the Ruffle bug tracker about it not supporting crossdomain.xml. The response from the devs is "use CORS".
Ruffle is also provided as a native browser extension, so I suppose that should be reviewed for security.
But I suppose someone might have a copy predating that "EOL will come" time bomb.
The problem is that the ob_end_clean() in wfResetOutputBuffers() results in a call to OutputHandler::callback(), so Content-Length: 0 is sent. Previously the buggy code was only activated if the request was HTTP/1.0, but Aaron's patch exposed it for other kinds of requests.
Tue, Apr 6
A wholesale revert is not really a resolution. Maybe we can reopen with lower priority and without the train blocker parent task?
Wed, Mar 31
I'm splitting out the cleanup task to T278917.
Tue, Mar 30
I'm trying to understand why it is locking tables at all. Apparently I added it in June 2004 72652c4bc72e7043f3bf12b71abc959510132b5d. I'm not sure why, does anyone have any idea? I tried reading the relevant chapter in the MySQL 3.23 manual, but there's no suggestion there to lock tables.
Sun, Mar 28
The change was made to allow MediaWiki to be installable with the UTF-8 character set selected. The maximum key length in MyISAM is 1000 bytes: https://dev.mysql.com/doc/refman/8.0/en/myisam-storage-engine.html . With the index (job_cmd, job_namespace, job_title), job_namespace is 4 bytes, job_title is 255*3=765 bytes, leaving 231 bytes for job_cmd. Before the referenced patch, job_cmd would require 765 bytes, causing an error on install, so I reduced that to 60 bytes, believing that that would be long enough for such an identifier.
Thu, Mar 25
Ideas from technical planning meeting:
- Make a separate hook runner for use within load.php?
- Preload? T240775 Should be easy after k8s deployment.
- Optimise PSR-4 code
- Concatenate files
- Have abstract classes representing hook modules, multiple hooks per class
There's a lot of cleanup of deprecated code which needs to be done first.
Wed, Mar 24
Mar 22 2021
Mar 21 2021
Mar 19 2021
- Request flow A
For the login request [...]
We instead rely on "sticky DC" cookies to pin a user the few seconds around session replication
Mar 18 2021
Mar 17 2021
Trait which reads @noDebugInfo doc comment annotations: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/672845
If it's worth doing, it's worth overdoing, am I right?
I would suggest high priority for that, per https://www.mediawiki.org/wiki/Phabricator/Project_management#Priority_levels . I am regularly reviewing all UBN tasks.
Hi, I see you triaged this task as Unbreak Now. Do you need help?
As Thiemo says, please do not misuse UBN priority.
Mar 16 2021
Let's leave this until we have some sort of confirmation that it would actually help us. I reviewed the Disconnect Firefox extension source code -- entities.json doesn't actually seem to be used there. But it looks like everything in entities.json has a corresponding entry in services.json, and everything in services.json is categorized as some sort of tracker. So the reason we're not in entities.json is because we're not considered to be a tracker. Probably adding Wikimedia would give people the option to block Wikimedia in the extension configuration. There are a number of open or closed bug reports against the list from companies asking to be removed, and from users asking for trackers to be added, but nobody is asking for non-trackers to be added to the list. So, maybe it is harmful.
Mar 15 2021
If you propose a puppet change, I can review it and verify that it's working after deployment.
Mar 12 2021
Writing that patch forced me to properly review all current usages of session storage. Foreign API tokens also need DC pinning or some other special solution.
Mar 11 2021
Is it necessary for tags to be non-overlapping? A query can be slow without being an error, and there can be errors without slowness.
As an optimisation, special routing for those path prefixes could be skipped if there is no session or token cookie in the request. Anonymous auto-login is always going to fail and will account for most requests. The code would be the same as the existing special cases for session and token cookies in the Varnish and ATS configuration.
@aaron and I reviewed the memcached PECL code, but we did not find anything which would cause this.
Mar 10 2021
Complete after 10 minutes.
When I looked at the file description page, I missed the fact that the file history is pageable. There are 778 versions of this file. I'll just delete it with deleteBatch.php. Please advise Tholden28 to not do that in future.
Actually, grepping for that request ID in FileOperation.log shows evidence of MediaWiki doing many, many requests to Swift in those 200 seconds. So maybe I was too quick to blame it on Swift. There are about 3000 lines like
This is a reproducible timeout in swift. Following is the backtrace from when I tried it.
Mar 9 2021
The relevant user (user_id=9299073) only has the one revision, there are no suppressed revisions for that user. I'm not sure how the transaction got split up, since neither the current code nor the code in MW 1.15 seems to have explicit transactions, so you would think there would be one transaction covering both the ipblocks insertion and the revision update. It wasn't a race between a new edit and a suppression: the edit occurred at 04:56 and the suppression was at 10:46.
User suppression is supposed to set the DELETED_RESTRICTED bit in rev_deleted, but rev_deleted is 0 in this case. But are we really talking about a suppression from 2009? The logs have been purged, I can't investigate a possible failed suppression from 2009. If there's not a lot of affected revisions, you can just suppress them manually.
I tried deleting it and confirmed that it's slow.
Tagging FeaturedFeeds since that's what it looks like from the message names in the task description.
Mar 8 2021
I think the criteria should be:
Mar 5 2021
Indeed, in a default installation it still appears to work. I was able to reproduce the problem in a new Firefox profile by enabling "strict" enhanced tracking protection. And I was able to permit cross-site login by disabling tracking protection for en.wikipedia.org via the shield icon.
Mar 4 2021
I see this is getting some heat. I think we need a power-user workaround, involving editing about:config.
I did that manual testing, it seems to work. Sorry for the noise on a mostly irrelevant task.
We need to do some manual testing before Excimer is released, like a high-concurrency multi-threaded stress test, since Remi is questioning whether it will work even after https://gerrit.wikimedia.org/r/c/mediawiki/php/excimer/+/667970 . Remi is saying we should just make it throw an error during configure "excimer does not support ZTS".
Mar 3 2021
On the upstream bug I asked for architecture advice, since I don't see a way to make our SUL system work with state partitioning. I'm also asking for temporary whitelisting since a full rearchitecture of sign-on (we might call it SUL3) is going to take months. Tim Huang, a relevant developer and co-author of the "Introducing State Partitioning" article on the Mozilla blog, is the "triage owner" of my bug, but it's unclear whether that means he has been notified. The next step for progressing the upstream bug is to make sure Tim Huang is aware of it.
I filed the upstream bug https://bugzilla.mozilla.org/show_bug.cgi?id=1696095
I tried to reproduce this locally with P14574, which throws an exception once per millisecond, and if an exception occurs during client destruction, it reuses the client. But it didn't turn up anything. Maybe in production?
config.m4 was forcing the compiler option -DZEND_ENABLE_STATIC_TSRMLS_CACHE=1, but that option is broken for shared extensions due to some kind of dynamic linking issue. The assembly generated by gcc -S is correct, but _tsrm_ls_cache ends up at the wrong address.
Mar 2 2021
It's a bit comical given the amount of code devoted to supposedly supporting ZTS mode. Evidently I never got around to testing it.
Please do not use the Unbreak Now priority for things that are not actually urgent. I am reviewing all tasks with this priority every day.
Mar 1 2021
I'm lowering the priority since it's hard to imagine some deprecation warnings preventing you from creating wikis. The actual blocker is presumably T275452.
I think the best way to test it is to actually run it for real in production. In principle, the script should be re-runnable if it fails halfway through. But don't use altwiki since that wiki seems to be fully set up and working fine. Which wikis need to be created?
Oh right, there is a separate task for the exception, which was fixed already.