Page MenuHomePhabricator

Create a Beta Cluster version of
Closed, ResolvedPublic


  • Create the new wiki
    • Set up all the necessary changes to CSP rules, logos, etc.
    • Configure it to use the back-end services
  • Create the back-end services
    • Create the box to run them,
    • Configure them in LabServices.php
    • Get them to automatically update as new versions of the images are published
    • Expose the back-end services so they can be used remotely in CI for end-to-end testing
  • Ensure the system runs well without major issues

Event Timeline

DVrandecic triaged this task as Medium priority.Jun 2 2021, 4:45 PM
DVrandecic lowered the priority of this task from Medium to Low.
DVrandecic moved this task from To Triage to Phase ζ on the Abstract Wikipedia team board.

Change 714068 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/puppet@production] [WIP] deployment-prep: Add

Mentioned in SAL (#wikimedia-releng) [2021-08-20T16:24:55Z] <majavah> deployment-prep: configure dns zones and add to acme-chief T284162

Change 740789 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/mediawiki-config@master] [BETA CLUSTER] Create wikifunctionswiki

Change 740790 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/mediawiki-config@master] [BETA CLUSTER] Configure wikifunctionswiki in wikiversions-labs.json

Change 714068 merged by Dzahn:

[operations/puppet@production] deployment-prep: Add

Change 740790 abandoned by Jforrester:

[operations/mediawiki-config@master] [BETA CLUSTER] Configure wikifunctionswiki in wikiversions-labs.json


Change 740789 merged by jenkins-bot:

[operations/mediawiki-config@master] [BETA CLUSTER] Create wikifunctionswiki

Mentioned in SAL (#wikimedia-operations) [2021-11-29T15:51:26Z] <James_F> Running mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=enwiki en wikimedia wikifunctionswiki in Beta Cluster for T284162

Mentioned in SAL (#wikimedia-operations) [2021-11-30T15:12:08Z] <jforrester@deploy1002> Synchronized multiversion/MWMultiVersion.php: Add wikifunctions hard-coded value to setSiteInfoForWiki for Beta Cluster T284162 (duration: 00m 56s)

Change 742740 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/mediawiki-config@master] Add WikiLambda to i18n extension list

Change 742740 merged by jenkins-bot:

[operations/mediawiki-config@master] Add WikiLambda to i18n extension list

Change 742756 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/mediawiki-config@master] Add initial namespace aliases for Wikifunctions

Change 742818 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/mediawiki-config@master] [Beta Cluster] Add project images for Wikifunctions

Change 742756 merged by jenkins-bot:

[operations/mediawiki-config@master] [Beta Cluster] Add initial namespace aliases for Wikifunctions

Change 742818 merged by jenkins-bot:

[operations/mediawiki-config@master] [Beta Cluster] Add project images for Wikifunctions

Don't know if that is expected, but going to results in the following.

Request from - via deployment-cache-text06.deployment-prep.eqiad.wmflabs, ATS/8.0.8
Error: 502, Next Hop Connection Failed at 2021-11-30 23:44:27 GMT

@Jdforrester-WMF anything left to do here?

Sadly, yes. First I need to re-jig the services to actually expose them under a useful name, and check that they're updating themselves, and then do some work to see why the PHP code is so very slow on Beta Cluster when it seems to work 'fine' (not fast, but nothing like this slow) in local development. If you'd like to grab this that'd be smashing, though. :-)

What does "re-jig the services to actually expose them under a useful name" mean?

What does "re-jig the services to actually expose them under a useful name" mean?

Oh, yes, that wasn't clear at all. :-)

We have the orchestrator and evaluator set up on deployment-docker-wikifunctions01 via the role::beta::docker_services puppet role, with ports 6927 for the evaluator and 6254 for the orchestrator, but they don't appear to be reachable from the MW instances on Beta (so Beta can't work), and we also want them available to the general Internet (so that the default WikiLambda code can point at them, and thus CI can run end-to-end tests using the external services).

In terms of getting things updated, profile::docker::runner::service_defs has image versions from November 2021 still there, so either there isn't an auto-updater for that or I broke it somehow.

In terms of getting things updated, profile::docker::runner::service_defs has image versions from November 2021 still there, so either there isn't an auto-updater for that or I broke it somehow.

Yeah, there's no latest tag for these images and no auto-updater FWICT, it's just manually set:

Change 784729 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/mediawiki-config@master] [Beta Cluster] Correct Wikifunctions service host names

Change 784729 merged by jenkins-bot:

[operations/mediawiki-config@master] [Beta Cluster] Correct Wikifunctions service host names

OK, I've manually rev'ed the version of the images used so these now work from the Beta Cluster boxes with the latest code (but they aren't auto-updating):

jforrester@deployment-deploy03:~$ curl
{"name":"function-evaluator","version":"0.0.1","description":"A Wikifunctions service to evaluate WikiLambda functions", "home":""}
jforrester@deployment-deploy03:~$ curl
{"name":"function-orchestrator","version":"0.0.1","description":"A Wikifunctions service to orchestrate WikiLambda function executors", "home":""}

Automatic tagging of 'latest' should be fixed with
But there is also another issue, which is that the builds have been failing:

Automatic tagging of 'latest' should be fixed with

Aha, thank you.

But there is also another issue, which is that the builds have been failing: might fix that issue; merge and find out?


  • Performed a systemctl daemon-reload on deployment-docker-wikifunctions01 to pick up the Puppet change.
  • Added IPV6 ingress rules for orchestrator/evaluator to wikifunctions security group.
  • Defined web proxies for the two services:
ServicePublic addressInternal address

Note that due to limitations of the cloud software stack, the public port is :443 (HTTPS) in both cases.

Being able to use latest is going to be provided by SRE in

It doesn't fully work. Filed T308598.

OK, T308598 is resolved and the images are now auto-updating.
I set the required env vars for the orchestrator instance with this change:

The orchestrator throws errors on the Beta Cluster because it's unable to get local issuer certificate:

{"name":"function-orchestrator","hostname":"6cba9bb69b55","pid":1,"level":"ERROR","message":"500: internal_error","stack":"FetchError: request to failed, reason: unable to get local issuer certificate\n    at ClientRequest.<anonymous> (/srv/service/node_modules/node-fetch/lib/index.js:1461:11)\n    at ClientRequest.emit (events.js:314:20)\n    at TLSSocket.socketErrorListener (_http_client.js:427:9)\n    at TLSSocket.emit (events.js:314:20)\n    at emitErrorNT (internal/streams/destroy.js:92:8)\n    at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)\n    at processTicksAndRejections (internal/process/task_queues.js:84:21)","status":500,"type":"internal_error","detail":"request to failed, reason: unable to get local issuer certificate","request_id":"54812d10-dc7c-11ec-ae49-bfbd12ea887f","request":{"url":"/1/v1/evaluate/","headers":{"content-type":"application/json","user-agent":"wikifunctions-request/1.39.0-alpha","content-length":"376","x-request-id":"54812d10-dc7c-11ec-ae49-bfbd12ea887f"},"method":"POST","params":{"0":"/1/v1/evaluate/"},"query":{},"remoteAddress":"","remotePort":40836},"levelPath":"error/500","msg":"500: internal_error","time":"2022-05-25T22:45:09.400Z","v":0}

Not sure what the fix for this should be. Should ca-certificates be installed in the docker image?

Filed T309261 for the missing issuer certificates. Temporarily worked around this by setting NODE_TLS_REJECT_UNAUTHORIZED=0 in the function-rchestrator's environment (diff). Things look like they're working now.

Logs from the function-* services are now shipped to Logstash, and I've created a simple dashboard:

(See for access credentials.)

ori updated the task description. (Show Details)