Page MenuHomePhabricator

[RfC] Move RunJobs.php to the mediawiki (core) repository
Open, NormalPublic

Description

Historically, the HTTP-facing script for running jobs on the jobrunners (RunJobs.php) resides in the rOMWC Wikimedia - MediaWiki Config repository, despite the fact that it is not really configuration material.

Instead, we think the rpc/RunJobs.php and rpc/RunSingleJob.php scripts should be moved into the main rMW MediaWiki repository under the /rpc hierarchy (as this is the current location they are being synced to during deployment).

Rationale:

  • They should be treated as software, not configuration
  • They are not WMF-specific, but rather, specific to MW installs using an HTTP interface for running their jobs asynchronously.
  • They are tightly coupled with the MW JobQueue codebase and infrastructure.
  • Easier sharing with other environments (BetaCluster, MW-Vagrant, etc)

Thoughts? Suggestions? Objections?

Event Timeline

mobrovac created this task.Sep 6 2017, 12:01 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 6 2017, 12:01 PM

@Legoktm @Anomie @aaron @Krinkle @tstarling any thoughts/objects/suggestions on this?

mobrovac triaged this task as Normal priority.Sep 14 2017, 3:30 PM
mobrovac updated the task description. (Show Details)

TL;DR: This proposal seems to be based on false premises.

Instead, we think the rpc/RunJobs.php and rpc/RunSingleJob.php scripts should be moved into the main rMW MediaWiki repository under the /rpc hierarchy (as this is the current location they are being synced to during deployment).

This does not seem to be correct. There is no /rpc hierarchy in the WMF-deployed version of the MediaWiki codebase, i.e. at /srv/mediawiki/php-$VERSION/rpc or anywhere under /srv/mediawiki/w or /srv/mediawiki/docroot.

The files in the rpc subdirectory of the rOMWC Wikimedia - MediaWiki Config repository are present on disk at /srv/mediawiki/rpc, because that's where all of that repository is checked out. But it isn't served publicly as it would be if it were added to the rMW MediaWiki repository.

  • They are not WMF-specific, but rather, specific to MW installs using an HTTP interface for running their jobs asynchronously.

This also does not seem to be correct. MediaWiki's "HTTP interface running [...] jobs asynchronously" is Special:RunJobs, already in core, which is posted to from MediaWiki::triggerAsyncJobs(). It takes care that it cannot be arbitrarily triggered from the public web.

MediaWiki's command-line interface for running jobs asynchronously (e.g. from cron) is maintenance/runJobs.php, which is also already in core.

On WMF sites, the RunJobs.php and RunSingleJob.php scripts are served internally on localhost only on the jobrunner hosts, and hit via curl from the code in rGJOB jobrunner via configuration in puppet. The configuration previously just executed maintenance/runJobs.php; rOPUP3f2beeb4b10f: Use the new "dispatcher" config format and use curl with HHVM implies that the change to hitting it via internal HTTP was something to do with HHVM.

  • They are tightly coupled with the MW JobQueue codebase and infrastructure.

This also does not seem to be correct. While the scripts themselves depend on MediaWiki's JobQueue interface, they don't depend extensively on the implementation details. And MediaWiki core doesn't know a thing about them.

  • Easier sharing with other environments (BetaCluster, MW-Vagrant, etc)

Beta Cluster is a non-issue, since it uses the same rOMWC Wikimedia - MediaWiki Config.

Tgr added a comment.Sep 18 2017, 4:41 AM

See T166010: The Great Namespaceization and Reorg and T167038: Separate "application" and "project" concerns for somewhat related discussions. These are basically entry points; IMO they should be handled as other entry points (which are either in MediaWiki root or in maintenance). (Although if someone wanted to move all entry points out of the root, that would be nice. But they should have a more generic name than "rpc". That terminology is not used in PHP anyway.)

These are basically entry points;

They're hacky private entry points for WMF's job runner infrastructure with HHVM rather than generally-useful or public entry points. Which is why I oppose putting them in MediaWiki core.

If they need moving they should probably be moved into puppet with the rest of the WMF configuration for the job runners, although that would make it somewhat harder for non-Ops devs to update them if necessary.

TL;DR: This proposal seems to be based on false premises.

Instead, we think the rpc/RunJobs.php and rpc/RunSingleJob.php scripts should be moved into the main rMW MediaWiki repository under the /rpc hierarchy (as this is the current location they are being synced to during deployment).

This does not seem to be correct. There is no /rpc hierarchy in the WMF-deployed version of the MediaWiki codebase, i.e. at /srv/mediawiki/php-$VERSION/rpc or anywhere under /srv/mediawiki/w or /srv/mediawiki/docroot.

I didn't mean to imply mw-core/rpc already exists, but rather that we could keep the same hierarchy. I do concur with @Tgr here that a more appropriate name could be employed.

The files in the rpc subdirectory of the rOMWC Wikimedia - MediaWiki Config repository are present on disk at /srv/mediawiki/rpc, because that's where all of that repository is checked out. But it isn't served publicly as it would be if it were added to the rMW MediaWiki repository.

Whether something is exposed to the public depends on the rewrite rules we have in the Apache2 configuration. These end points should definitely be exposed only to the production environment as we don't want random people to trigger JobQueue executions.

  • They are not WMF-specific, but rather, specific to MW installs using an HTTP interface for running their jobs asynchronously.

This also does not seem to be correct. MediaWiki's "HTTP interface running [...] jobs asynchronously" is Special:RunJobs, already in core, which is posted to from MediaWiki::triggerAsyncJobs(). It takes care that it cannot be arbitrarily triggered from the public web.
MediaWiki's command-line interface for running jobs asynchronously (e.g. from cron) is maintenance/runJobs.php, which is also already in core.

Right. Special:RunJobs can be used by MW users and maint/RunJobs.php by site admins, so why not include an HTTP end point that could be used by APIs and enable application or service composition?

  • They are tightly coupled with the MW JobQueue codebase and infrastructure.

This also does not seem to be correct. While the scripts themselves depend on MediaWiki's JobQueue interface, they don't depend extensively on the implementation details. And MediaWiki core doesn't know a thing about them.

But MW core provides the interfaces all these approaches use and in fact MW itself assumes that the jobs will be run regularly (whether by directly running any of the aforementioned scripts or by telling MW to run them piggy-backing on client requests). This seems to enforce the argument that RunJobs and RunSingleJobs are part of the code, not configuration.

Also, one important thing to note for this discussion is that, unlike rpc/RunJobs.php which relies on the existence of a Redis queue, rpc/RunSingleJob.php has to receive a full job definition (type, params, etc) in the request body to execute and then simply relays it for actual job execution to whichever back-end is configured. In that sense rpc/RunSingleJob.php is more general than rpc/RunJobs.php.

These are basically entry points;

They're hacky private entry points for WMF's job runner infrastructure with HHVM rather than generally-useful or public entry points. Which is why I oppose putting them in MediaWiki core.

This might have been true for rpc/RunJobs.php, but it's not for rpc/RunSingleJob.php (see above).

If they need moving they should probably be moved into puppet with the rest of the WMF configuration for the job runners, although that would make it somewhat harder for non-Ops devs to update them if necessary.

Again, I don't think putting code into any type of configuration repo is a good and/or sustainable practice.

I didn't mean to imply mw-core/rpc already exists, but rather that we could keep the same hierarchy. I do concur with @Tgr here that a more appropriate name could be employed.

There's no existing hierarchy to keep. Placing it in a similarly-named subdirectory in a different place with different semantics isn't "keep[ing] the same hierarchy".

Whether something is exposed to the public depends on the rewrite rules we have in the Apache2 configuration. These end points should definitely be exposed only to the production environment as we don't want random people to trigger JobQueue executions.

And what about most third parties who don't have fancy Apache rewrite rules? Or are you blocking this on a not-yet-filed task to move public entry points to a subdirectory and change the MediaWiki installation instructions to state that only that directory is safe to expose to the public? Keeping in mind that, as they currently exist, these two scripts are unsuited to be in said directory.

Right. Special:RunJobs can be used by MW users and maint/RunJobs.php by site admins, so why not include an HTTP end point that could be used by APIs and enable application or service composition?

If a general HTTP endpoint to trigger a job run is really needed (I'm very skeptical on that point), it would need to be properly rights-limited. These scripts are not, and making RunJobs.php do so would probably defeat its purpose.

But MW core provides the interfaces all these approaches use and in fact MW itself assumes that the jobs will be run regularly (whether by directly running any of the aforementioned scripts or by telling MW to run them piggy-backing on client requests).

Which has nothing to do with your proposal to add these specific scripts to MediaWiki core.

Also, one important thing to note for this discussion is that, unlike rpc/RunJobs.php which relies on the existence of a Redis queue, rpc/RunSingleJob.php has to receive a full job definition (type, params, etc) in the request body to execute and then simply relays it for actual job execution to whichever back-end is configured. In that sense rpc/RunSingleJob.php is more general than rpc/RunJobs.php.

[...]

They're hacky private entry points for WMF's job runner infrastructure with HHVM rather than generally-useful or public entry points. Which is why I oppose putting them in MediaWiki core.

This might have been true for rpc/RunJobs.php, but it's not for rpc/RunSingleJob.php (see above).

I'm not particularly convinced by the existence of a script you yourself added last month with no clear use case indicated.

Now that I look at it a little closer, it seems to bypass all the job queuing entirely to simply run code with no ordering or rate limiting. Which seems even less like something that would be useful or non-dangerous for anyone outside of WMF teams trying to replace the MediaWiki job queue entirely with a complex external software stack.

And I see it depends on code that's not currently a part of MediaWiki core, although I see a comment on 7e3336cb that merging it into core is a future intent.

If they need moving they should probably be moved into puppet with the rest of the WMF configuration for the job runners, although that would make it somewhat harder for non-Ops devs to update them if necessary.

Again, I don't think putting code into any type of configuration repo is a good and/or sustainable practice.

Puppet isn't just a configuration repo. It also contains many small scripts of the type a sysadmin would write to glue things together. Like RunJobs.php, for example.

Krinkle edited projects, added TechCom-RFC; removed Proposal.Sep 19 2017, 5:49 PM
Krinkle added a comment.EditedSep 19 2017, 5:59 PM

I'd like to consider re-using existing infrastructure if possible, especially if it means not adding more .php entry points to MediaWiki.

For the rpc/runJobs.php script WMF uses, I'm curious whether it would make sense to consider merging it into core's SpecialRunJobs, which seems rather similar.

SpecialRunJobsrpc/runJobs
multiple jobs, optionally by typemultiple jobs, optionally by type
in document rootoutside document root
restrict use via secret keyvalidates $_SERVER['REMOTE_ADDR']
runs through Multiversion, WebStart, MediaWiki.phpruns through Multiversion, WebStart, MediaWiki.php
enters from index.php, parses title to construct instanceconstructs instance directly

The only obvious difference seems the (small?) overhead of index.php title parsing, wiki-routing, and restriction guards using secret keys vs remote addr validation. If we prefer remote_addr as a way of restricting access, we could add an option for that to SpecialRunJobs. Similarly, RunSingleJob could become a special page.

I don't actually have a preference for special pages. But I do prefer we avoid adding new entry points and re-use index.php or api.php instead.

@mobrovac Could you look at these options and see if the overhead is significant or not?

Alternatively, if it does pose a problem, we could think about adding an /rpc router to MediaWiki (similar to article-path routing for /wiki), and reform the RFC as being about that, with a problem statement alongside and RunJobs/RunSingleJob as the first use case.

daniel moved this task from Inbox to Under discussion on the TechCom-RFC board.Sep 20 2017, 8:43 PM

I'd like to consider re-using existing infrastructure if possible, especially if it means not adding more .php entry points to MediaWiki.
For the rpc/runJobs.php script WMF uses, I'm curious whether it would make sense to consider merging it into core's SpecialRunJobs, which seems rather similar.

Indeed, they both accept the same parameters and perform the exact same thing. If we folded them into Special:RunJobs there would need to be some Puppet work involved as rpc/RunJobs.php is the mechanism used on the job runners to trigger execution rounds. However, given that the job runners contain full MW installs, that shouldn't be that hard to achieve.

The only obvious difference seems the (small?) overhead of index.php title parsing, wiki-routing, and restriction guards using secret keys vs remote addr validation. If we prefer remote_addr as a way of restricting access, we could add an option for that to SpecialRunJobs. Similarly, RunSingleJob could become a special page.

I would actually prefer using secret keys, as for the new job queue infrastructure we will be using LVS, so no request will be local to the job runners. Since the job runner hosts are not exposed to the public, it might be enough to restrict access to their HTTP ports to selected hosts/parts of the infra. Combined with secret keys, that should give us fairly good security.

I don't actually have a preference for special pages. But I do prefer we avoid adding new entry points and re-use index.php or api.php instead.

No preference here either, but api.php does seem slightly more suited to this use case than index.php.

daniel added a subscriber: daniel.Oct 11 2017, 8:59 PM

This RFC is up for discussion on IRC now. If you are interested, please join #wikimedia-office on freenode.

IRC meeting minutes:

https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-10-11-21.01.html

  • https://phabricator.wikimedia.org/T175146 (DanielK_WMDE, 21:01:43)
  • <mobrovac> in our opinion, [RunJobs.php] should be part of the code, instead of considered "config" (DanielK_WMDE, 21:06:39)
  • <SMalyshev> I see the scripts require multiversion - which seems to be uncommon for core code (DanielK_WMDE, 21:06:45)
  • Tim said: this should probably be na API action (but maybe not, since it has a wiki switch) (DanielK_WMDE, 21:08:18)
  • Krinkl's overview of Special:RunJobs vs rpc/RunJobs: https://phabricator.wikimedia.org/T175146#3618503 (DanielK_WMDE, 21:13:02)
  • <TimStarling> RunSingleJob, in the way it takes a JSON specification and produces a JSON result, is almost identical to ParsoidBatchAPI (DanielK_WMDE, 21:14:46)
  • < TimStarling> as for authorization, ParsoidBatchAPI has a client IP filter, like RunSingleJob.php (_joe_, 21:15:08)
  • <Krinkle> As far as I know there was no strong reason for that not to have used Special:RunJobs (DanielK_WMDE, 21:15:22)
  • <_joe_> waht about authorization/authentication? <TimStarling> a secret key is probably easier to configure, if the client and server get it from the same configuration (DanielK_WMDE, 21:17:08)
  • <Krinkle> I just wanna make sure switching from remote_addr to a secret token isn't going to be considered a negative thign or a blocker to moving away from our custom RPC endpoint. (DanielK_WMDE, 21:23:22)
  • <Krinkle> Aside from the authentication difference, another difference between RPC and SpecialPage (or API), is that it will need to be invoked with the wikis' canonical hostname. (DanielK_WMDE, 21:24:26)
  • consensus on using a secret key achieved (mobrovac, 21:25:35)
  • <_joe_> I want to stress the security implications of a special page/api action allowing to submit the data of the job, as RunSingleJob.php does (mobrovac, 21:25:58)
  • <_joe_> I want to stress the security implications of a special page/api action allowing to submit the data of the job, as RunSingleJob.php does (DanielK_WMDE, 21:26:06)
  • <TimStarling> my preference would be to introduce a new action API module to core which is very similar to Special:RunJobs (DanielK_WMDE, 21:27:26)
  • Special:RunJobs actually used to be an API action, and was converted to a special page. (Krinkle, 21:28:14)
  • DanielK_WMDE: mobrovac: in my mind, both the special page and the api module should be thin wrappers around a class that implements the actual logic (mobrovac, 21:31:12)
  • <TimStarling> the commit introducing the special page references https://phabricator.wikimedia.org/T64233 (DanielK_WMDE, 21:33:31)
  • The SpecialRunJobs is already abstracted with the JobRunner class and re-used by the core/maintenance/runJobs.php script too (Krinkle, 21:34:40)
  • <TimStarling> the old API action was disconnecting from the client and running jobs in the background (DanielK_WMDE, 21:40:31)
  • <TimStarling> RunSingleJob.php doesn't do that, and doesn't want that, so an API action that works like RunSingleJob is not such a big problem (DanielK_WMDE, 21:40:40)
  • <TimStarling> you don't benefit so much from being an API module if you're overriding the whole output layer (DanielK_WMDE, 21:41:15)
  • <TimStarling> I would prefer an API module but wouldn't block using the special page (DanielK_WMDE, 21:50:07)
  • <Krinkle> the socket we have in MediaWiki.php#triggerAsyncJobs does an fread() until it gets at least 1 byte, and then it closes. (DanielK_WMDE, 22:00:27)

Change 383736 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] SpecialRunJobs: Remove unused JSON response

https://gerrit.wikimedia.org/r/383736

Change 383737 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] SpecialRunJobs: Use normal OutputPage and built-in async mode

https://gerrit.wikimedia.org/r/383737

Change 383736 merged by jenkins-bot:
[mediawiki/core@master] SpecialRunJobs: Remove unused JSON response

https://gerrit.wikimedia.org/r/383736

Change 383737 merged by jenkins-bot:
[mediawiki/core@master] SpecialRunJobs: Use MediaWiki's built-in async/post-send mode

https://gerrit.wikimedia.org/r/383737

Change 384994 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/extensions/EventBus@master] Create SpecialRunSingleJob.php

https://gerrit.wikimedia.org/r/384994

Change 384994 merged by Mobrovac:
[mediawiki/extensions/EventBus@master] Create SpecialRunSingleJob.php

https://gerrit.wikimedia.org/r/384994

Change 385382 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[operations/puppet@production] CP-JobQueue: Use the Special:RunSingleJob page to execute jobs

https://gerrit.wikimedia.org/r/385382

Krinkle closed this task as Resolved.Jan 24 2018, 11:04 PM
Krinkle claimed this task.

This was already approved in October 2017 and implemented since then. We forgot to update its state in Phabricator.

For the record: This task was approved before our updated RFC process having a Last Call stage

Legoktm reopened this task as Open.Jan 25 2018, 5:46 AM

This hasn't been fully implemented yet, SpecialRunSingleJob still lives in the EventBus extension, not MediaWiki core.

Krinkle removed Krinkle as the assignee of this task.Jan 30 2018, 8:45 PM
Krinkle moved this task from Implemented to In progress on the TechCom-RFC (TechCom-Approved) board.
Krinkle moved this task from Untriaged to Meta on the WMF-JobQueue board.
Krinkle moved this task from Meta to Untriaged on the WMF-JobQueue board.Jul 12 2018, 10:54 PM