User Details
- User Since
- Oct 20 2014, 5:25 PM (496 w, 5 d)
- Availability
- Available
- IRC Nick
- AaronSchulz
- LDAP User
- Aaron Schulz
- MediaWiki User
- Aaron Schulz [ Global Accounts ]
Fri, Apr 19
Wed, Apr 10
The user needs movestable or review. The code could just be changed to also allow moves with autoreview.
Wed, Apr 3
Thu, Mar 28
Mar 28 2024
An example of a diff where lines are added is https://en.wikipedia.org/w/index.php?title=User:Spicy/spihelper_log&curid=74418657&diff=1216025285&oldid=1216024788
Mar 27 2024
For basic maintainability, no new "support" should be added for this "feature".
Mar 26 2024
Mar 25 2024
Mar 19 2024
I think the intent was to have better consistency for doOperations() calls with only a single-step file operation (e.g. not move for swift). If a PUT request fails in the master backend, then not trying it on the second backend keeps things consistent. This doesn't cover the edge case of a non-answer (e.g. timeout or 503) where the objects where ultimately saved.
Mar 15 2024
Mar 14 2024
Mar 2 2024
I couple things I wonder about:
- Though the bottleneck seems to be EventGate more than Kafka, I still wonder why profile::kafka::mirror::properties doesn't blacklist all MW jobs?* Is anything making use of that extra data?
- Are there stats on the average byte length of jobs enqueued? Maybe JobQueueEventBus could emit those to find bulky jobs. At least cirrus search jobs are known for being bulky. Maybe there are more. I assume the bulky cirrus job problem will be resolved if the new stream-based updater is deployed (T317045).
- I wonder if JobQueueGroup::lazyPush()/JobQueueEventBus could be rigged to make the provided jobs use "hasty" mode in EventGate?
Feb 27 2024
I think having fatals and warnings is useful and reasonable. What seems confusing now is that we have kind-of-three levels of errors, one of which is confusingly called an "error" type error:
- StatusValue::warning() sets a "warning" error.
- StatusValue::error() sets an "error" error. This seems the most dubious. Is it supposed to mean that something definitely failed, but didn't ruin the whole operation? I can't see where this would come up besides some kind of very unusual sub-status merging, which should probably be done with custom merge logic anyway (possibly changing sub-status fatals to warnings).
- StatusValue::error() sets a "error" error (not a "fatal" error), and makes isOK() return false. Since the type is just "error", one can no longer find "which errors" caused the fatal (if any, since something could just call setOK( false ) for some reason).
Feb 22 2024
Feb 16 2024
Feb 8 2024
Feb 1 2024
Jan 30 2024
I don't think anything new should get built on top of $wgSharedTables or setTableAliases(). Those are basically old tech debt that's akin to inserting things in the middle of a jinga tower to hopefully get the intent result.
Jan 25 2024
I'm strongly in favor of getting rid of the mess that is $wgSharedTables. Any mechanism that supports sharing tables needs to be designed to be aware of that, using appropriate methods to get database handles that are documented as being on possibly different databases. The caller would at least know that certain tables are co-located.
I'd rather not have the writes be cross-datacenter, tying up the working thread in the post-send stage of web requests, especially given the potential for spikes. If there was some aggregation service in the middle (batching/flushing counter updates), that would less risky, though more complex.
Jan 22 2024
Jan 15 2024
Jan 11 2024
Dec 21 2023
Dec 13 2023
Is this still an issue? I'm not seeing this in any kibana production logs.
Dec 11 2023
One last point was that the configuration arrays include a 'db' key, which is actually a DB Domain. Should 'db' be enforced as a database name only (not a full DB Domain), maybe renamed 'dbname', or should it be renamed 'dbDomain', or maybe something else?
What about VirtualDomainsMapping vs VirtualDomainMapping? Maybe it could even be DatabaseVirtualDomainMapping since this has nothing to do with vhosts or anything like that?
Dec 4 2023
So, without T246371 being done yet, it looks like current apache config on the jobrunners won't let you use RunSingleJobHandler, but only /rpc/* stuff. So, either:
- an exception would have to be made for the /w/rest.php/eventbus/v0/internal/job/execute within modules/profile/templates/mediawiki/jobrunner/site.conf.erb
- rpc/RunSingleJob.php would have to temporarily be used by the search updater and migrated later.
One thing to also fix here is that things like SELECT FOR UPDATE, SELECT GET_LOCK()...any SELECT really...should be exempted from the transaction size check in approvePrimaryChanges(). There is no use in ROLLBACK at that point. I'll make a patch for that.
Nov 29 2023
Nov 18 2023
Nov 17 2023
Nov 16 2023
Re-reading the description and comments it seems to me that this task is more about addQuotes(), phan-taint, and a more fluent looking interface more so than eliminating raw SQL from code calling rdbms query methods. Is the idea to keep most of the build*() methods around? Currently, the $field arguments to Database::expr() allows raw SQL in $field. Is that intentional in the long run? If so, then buildMultiUpsertSetForOverwrite() can just use SUBSTR() in the $field argument. SelectQueryBuilder::fields() allows raw SQL fields and SelectQueryBuilder::where() allows raw SQL. On the other hand, SelectQueryBuilder::table() wants a query builder for computed/derived tables rather than a string from selectSQLText() or such.
Nov 13 2023
Maybe SyncObjectStash is an OK name?
Nov 9 2023
I mean that the job would never be enqueued into kafka, it would be executed by specifying the serialized job to RunSingleJobHandler, which runs it on the fly and returns a status code. If retries are enabled, it just changes the response status in some cases, but that endpoint itself does not try to re-enqueue the job (in this case a never-enqueued job). The Job class would decide what success/error info is propagated up to the JSON response. None of this would involve touching kafka. The stream updater would just be POSTing requests to the RunSingleJobHandler to "refresh this page" and reading the response.
I feel like the current MainStash could equally or more so be called a WANObjectStash that a one that routes to a single datacenter. Essentially, the main stash now is one that must perform in multiple DCs, sacrificing more consistency in order to do so. It's tricky to come up with a good name though. Maybe:
- ColocaleStash
- OneRegionStash
- LinearStash (e.g. best-effort tries to be linearizable for single-key operations)
It looks like the core MW ApiQuerySearch module does not require POST and the HTTP body parameter in the linked code don't seem like they would be too long, even 'gsrsearch' which is basically a just title AFAIK. So, this looks like it's a matter of changing the restbase call to GET and moving the body parameters to URL parameters. The larger work probably lies in setting up and testing restbase.
If T246371 was done, then the stream updater could just make a request to run a serialized job (provided in the same request) for each page that needs updating. The job would do the backend work of prop=cirrusbuilddoc and also the search index updates. Though the job would have to be registered in $wgJobClasses, nothing would actually enqueue the jobs. The error/retry logic and ability to curtail concurrency would be controlled by the stream updater rather than the generic job runner logic. Would that avoid the job queue issues?
Nov 8 2023
Nov 7 2023
I don't think API Platform will take this up anytime soon, but I wouldn't mind helping with review.
Nov 2 2023
Nov 1 2023
Oct 31 2023
CC'ing @Nikerabbit . The new messages can clone the old translations. I don't know the exact TWN process for that, but it should apply here.
I think we should nail this down before things get backed into stable releases and compatibility is hard to break.
Oct 30 2023
Oct 27 2023
Oct 25 2023
Oct 24 2023
Oct 20 2023
That sounds consistent with the operations failing in one data-center (the closest to the uploader) and then getting cleaned up by the periodic sync script.
Oct 19 2023
I wonder if the auth token just expired while the combined file was being uploaded (but after the PUT started). By the time the chunk deletions are issued, perhaps the token expired. AFAIK, the tokens should last for a day and MediaWiki app servers only reuse them for ~15 minutes (which used to be higher, but they seemed to expire too soon). It would be interesting to see the memcached stats for the token storage on the swift servers (maybe there is eviction pressure for some reason, like some service never reusing tokens and thus flooding memcached).
I'm noticing that GroupPermissionsLookup is poorly documented.