Page MenuHomePhabricator

Create a Beta Cluster version of Wikifunctions.org
Closed, ResolvedPublic

Description

  • Create the new wiki
    • Set up all the necessary changes to CSP rules, logos, etc.
    • Configure it to use the back-end services
  • Create the back-end services
    • Create the box to run them, deployment-docker-wikifunctions01.deployment-prep.eqiad1.wikimedia.cloud
    • Configure them in LabServices.php
    • Get them to automatically update as new versions of the images are published
    • Expose the back-end services so they can be used remotely in CI for end-to-end testing
  • Ensure the system runs well without major issues

Event Timeline

DVrandecic triaged this task as Medium priority.Jun 2 2021, 4:45 PM
DVrandecic lowered the priority of this task from Medium to Low.
DVrandecic moved this task from To triage to Phase ζ on the Abstract Wikipedia team board.

Change 714068 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/puppet@production] [WIP] deployment-prep: Add wikifunctions.beta.wmflabs.org

https://gerrit.wikimedia.org/r/714068

Mentioned in SAL (#wikimedia-releng) [2021-08-20T16:24:55Z] <majavah> deployment-prep: configure wikifunctions.beta.wmflabs.org dns zones and add to acme-chief T284162

Change 740789 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/mediawiki-config@master] [BETA CLUSTER] Create wikifunctionswiki

https://gerrit.wikimedia.org/r/740789

Change 740790 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/mediawiki-config@master] [BETA CLUSTER] Configure wikifunctionswiki in wikiversions-labs.json

https://gerrit.wikimedia.org/r/740790

Change 714068 merged by Dzahn:

[operations/puppet@production] deployment-prep: Add wikifunctions.beta.wmflabs.org

https://gerrit.wikimedia.org/r/714068

Change 740790 abandoned by Jforrester:

[operations/mediawiki-config@master] [BETA CLUSTER] Configure wikifunctionswiki in wikiversions-labs.json

Reason:

https://gerrit.wikimedia.org/r/740790

Change 740789 merged by jenkins-bot:

[operations/mediawiki-config@master] [BETA CLUSTER] Create wikifunctionswiki

https://gerrit.wikimedia.org/r/740789

Mentioned in SAL (#wikimedia-operations) [2021-11-29T15:51:26Z] <James_F> Running mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=enwiki en wikimedia wikifunctionswiki wikifunctions.beta.wmflabs.org in Beta Cluster for T284162

Mentioned in SAL (#wikimedia-operations) [2021-11-30T15:12:08Z] <jforrester@deploy1002> Synchronized multiversion/MWMultiVersion.php: Add wikifunctions hard-coded value to setSiteInfoForWiki for Beta Cluster T284162 (duration: 00m 56s)

Change 742740 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/mediawiki-config@master] Add WikiLambda to i18n extension list

https://gerrit.wikimedia.org/r/742740

Change 742740 merged by jenkins-bot:

[operations/mediawiki-config@master] Add WikiLambda to i18n extension list

https://gerrit.wikimedia.org/r/742740

Change 742756 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/mediawiki-config@master] Add initial namespace aliases for Wikifunctions

https://gerrit.wikimedia.org/r/742756

Change 742818 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/mediawiki-config@master] [Beta Cluster] Add project images for Wikifunctions

https://gerrit.wikimedia.org/r/742818

Change 742756 merged by jenkins-bot:

[operations/mediawiki-config@master] [Beta Cluster] Add initial namespace aliases for Wikifunctions

https://gerrit.wikimedia.org/r/742756

Change 742818 merged by jenkins-bot:

[operations/mediawiki-config@master] [Beta Cluster] Add project images for Wikifunctions

https://gerrit.wikimedia.org/r/742818

Don't know if that is expected, but going to https://wikifunctions.beta.wmflabs.org/wiki/Special:Log results in the following.

Request from - via deployment-cache-text06.deployment-prep.eqiad.wmflabs, ATS/8.0.8
Error: 502, Next Hop Connection Failed at 2021-11-30 23:44:27 GMT

@Jdforrester-WMF anything left to do here?

Sadly, yes. First I need to re-jig the services to actually expose them under a useful name, and check that they're updating themselves, and then do some work to see why the PHP code is so very slow on Beta Cluster when it seems to work 'fine' (not fast, but nothing like this slow) in local development. If you'd like to grab this that'd be smashing, though. :-)

What does "re-jig the services to actually expose them under a useful name" mean?

What does "re-jig the services to actually expose them under a useful name" mean?

Oh, yes, that wasn't clear at all. :-)

We have the orchestrator and evaluator set up on deployment-docker-wikifunctions01 via the role::beta::docker_services puppet role, with ports 6927 for the evaluator and 6254 for the orchestrator, but they don't appear to be reachable from the MW instances on Beta (so Beta can't work), and we also want them available to the general Internet (so that the default WikiLambda code can point at them, and thus CI can run end-to-end tests using the external services).

In terms of getting things updated, profile::docker::runner::service_defs has image versions from November 2021 still there, so either there isn't an auto-updater for that or I broke it somehow.

In terms of getting things updated, profile::docker::runner::service_defs has image versions from November 2021 still there, so either there isn't an auto-updater for that or I broke it somehow.

Yeah, there's no latest tag for these images and no auto-updater FWICT, it's just manually set: https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+blame/master/deployment-prep/deployment-docker-wikifunctions01.deployment-prep.eqiad1.wikimedia.cloud.yaml

Change 784729 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/mediawiki-config@master] [Beta Cluster] Correct Wikifunctions service host names

https://gerrit.wikimedia.org/r/784729

Change 784729 merged by jenkins-bot:

[operations/mediawiki-config@master] [Beta Cluster] Correct Wikifunctions service host names

https://gerrit.wikimedia.org/r/784729

OK, I've manually rev'ed the version of the images used so these now work from the Beta Cluster boxes with the latest code (but they aren't auto-updating):

jforrester@deployment-deploy03:~$ curl deployment-docker-wikifunctions01.deployment-prep.eqiad1.wikimedia.cloud:6927/_info
{"name":"function-evaluator","version":"0.0.1","description":"A Wikifunctions service to evaluate WikiLambda functions", "home":"http://meta.wikimedia.org/wiki/Abstract%20Wikipedia"}
jforrester@deployment-deploy03:~$ curl deployment-docker-wikifunctions01.deployment-prep.eqiad1.wikimedia.cloud:6254/_info
{"name":"function-orchestrator","version":"0.0.1","description":"A Wikifunctions service to orchestrate WikiLambda function executors", "home":"http://meta.wikimedia.org/wiki/Abstract%20Wikipedia"}

Automatic tagging of 'latest' should be fixed with https://gerrit.wikimedia.org/r/c/integration/config/+/788806/
But there is also another issue, which is that the builds have been failing:
https://integration.wikimedia.org/ci/job/service-pipeline-test-and-publish/2581/console

Automatic tagging of 'latest' should be fixed with https://gerrit.wikimedia.org/r/c/integration/config/+/788806/

Aha, thank you.

But there is also another issue, which is that the builds have been failing:
https://integration.wikimedia.org/ci/job/service-pipeline-test-and-publish/2581/console

https://gerrit.wikimedia.org/r/c/mediawiki/services/function-orchestrator/+/789136 might fix that issue; merge and find out?

Progress:

  • Performed a systemctl daemon-reload on deployment-docker-wikifunctions01 to pick up the Puppet change.
  • Added IPV6 ingress rules for orchestrator/evaluator to wikifunctions security group.
  • Defined web proxies for the two services:
ServicePublic addressInternal address
Orchestratorwikifunctions-orchestrator-beta.wmflabs.org:443deployment-docker-wikifunctions01:6254
Evaluatorwikifunctions-evaluator-beta.wmflabs.org:443deployment-docker-wikifunctions01:6927

Note that due to limitations of the cloud software stack, the public port is :443 (HTTPS) in both cases.

Being able to use latest is going to be provided by SRE in https://gerrit.wikimedia.org/r/c/operations/puppet/+/789846

It doesn't fully work. Filed T308598.

OK, T308598 is resolved and the images are now auto-updating.
I set the required env vars for the orchestrator instance with this change:
https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/c0409f7e158288b90def965254fa1e0d19a7618a%5E%21/#F0

The orchestrator throws errors on the Beta Cluster because it's unable to get local issuer certificate:

{"name":"function-orchestrator","hostname":"6cba9bb69b55","pid":1,"level":"ERROR","message":"500: internal_error","stack":"FetchError: request to https://wikifunctions.beta.wmflabs.org/w/api.php?action=wikilambda_fetch&format=json&zids=Z7%7CZ9 failed, reason: unable to get local issuer certificate\n    at ClientRequest.<anonymous> (/srv/service/node_modules/node-fetch/lib/index.js:1461:11)\n    at ClientRequest.emit (events.js:314:20)\n    at TLSSocket.socketErrorListener (_http_client.js:427:9)\n    at TLSSocket.emit (events.js:314:20)\n    at emitErrorNT (internal/streams/destroy.js:92:8)\n    at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)\n    at processTicksAndRejections (internal/process/task_queues.js:84:21)","status":500,"type":"internal_error","detail":"request to https://wikifunctions.beta.wmflabs.org/w/api.php?action=wikilambda_fetch&format=json&zids=Z7%7CZ9 failed, reason: unable to get local issuer certificate","request_id":"54812d10-dc7c-11ec-ae49-bfbd12ea887f","request":{"url":"/1/v1/evaluate/","headers":{"content-type":"application/json","user-agent":"wikifunctions-request/1.39.0-alpha","content-length":"376","x-request-id":"54812d10-dc7c-11ec-ae49-bfbd12ea887f"},"method":"POST","params":{"0":"/1/v1/evaluate/"},"query":{},"remoteAddress":"172.16.3.203","remotePort":40836},"levelPath":"error/500","msg":"500: internal_error","time":"2022-05-25T22:45:09.400Z","v":0}

Not sure what the fix for this should be. Should ca-certificates be installed in the docker image?

Filed T309261 for the missing issuer certificates. Temporarily worked around this by setting NODE_TLS_REJECT_UNAUTHORIZED=0 in the function-rchestrator's environment (diff). Things look like they're working now.

Logs from the function-* services are now shipped to Logstash, and I've created a simple dashboard:

https://beta-logs.wmcloud.org/app/dashboards#/view/6047da50-e0ef-11ec-9d5f-9f290f4ecdda

(See https://wikitech.wikimedia.org/wiki/Logstash#Beta_Cluster_Logstash for access credentials.)

ori updated the task description. (Show Details)