Page MenuHomePhabricator

Enable MW REST API on job runners and video scalers (for the new rest.php job executor)
Open, MediumPublic

Description

Currently job runners and videoscalers have a very custom mediawiki site config that allows calling to /rpc/RunSingleJob.php but disallows calling normal MediaWiki entrypoints.

In order to migrate job execution to the new REST endpoint T244826, we need to be able to call w/rest.php/eventbus/v0/internal/job/execute on job runners and vide scalers. The selection of the wiki can be done via the Host header. So, I guess a normal MW site config should be applied to jobrunners and videoscalers.

For the transitional period, we have a rollout plan T246371, so both access methods should work simultaneously. After the transition is completed, all the special site configuration for jobrunner can be cleaned up, since both special /rpc job running scripts will be deleted.

Event Timeline

Change 576913 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/puppet@production] jobrunner: Standard mediawiki webserver configuration

https://gerrit.wikimedia.org/r/576913

There are several issues with this approach:

  • We need to be able to discern videoscaling and normal jobrunning in separated balanced pools. We can probably keep doing so by keeping the IPs separated
  • We need to adapt the timeout to be 20 minutes for jobs and 1 day for videos. I'm not sure how that could be done. Maybe sending a special header in the request
  • We need to fix how timeouts are set in mediawiki-config too.

We need to be able to discern videoscaling and normal jobrunning in separated balanced pools. We can probably keep doing so by keeping the IPs separated

Yeah, I'm not asking to join the pools of servers - we're still gonna be calling distinct LVS endpoints and we want the clusters to be separated.

We need to fix how timeouts are set in mediawiki-config too.

Which timeout are you talking about? We're setting PHP-level timeouts in mediawiki-config, that approach will keep working? Are there any other timeouts on SRE level?

Krinkle renamed this task from Allow MW REST API to be called on job runners and video scalers to Enable MW REST API on job runners and video scalers (for the new rest.php job executor).Mar 6 2020, 11:06 PM
Krinkle moved this task from Untriaged to EventBus infra on the WMF-JobQueue board.

One concern around harmonisation of configs is that MW hosts have a different base apache2.conf. I've looked at the differences and the only notable differences are:

Timeout:

Stock: 300
MW: 202

MaxKeepAliveRequests:

Stock 100
MW: 150

KeepAliveTimeout:

Stock: 5
MW: 2

Change 576913 abandoned by Hnowlan:

[operations/puppet@production] jobrunner: Standard mediawiki webserver configuration

Reason:

mw-jobrunners will replace this

https://gerrit.wikimedia.org/r/576913