Page MenuHomePhabricator

Rename mpic.wikimedia.org
Open, In Progress, MediumPublic

Description

Acceptance Criteria

Notes

  • External callers (Beta cluster, wmftkbot) have to use the public domain because The Cloud VPS network is external to the WMF production network. There is no way to use discovery.wmnet domains from there

Details

Other Assignee
brouberol
Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/mediawiki-configmaster+50 -2
operations/mediawiki-configmaster+3 -3
operations/puppetproduction+1 -1
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+1 -1
operations/puppetproduction+0 -24
operations/puppetproduction+2 -2
operations/puppetproduction+0 -8
operations/puppetproduction+3 -3
operations/puppetproduction+1 -1
operations/puppetproduction+2 -2
operations/puppetproduction+1 -3
operations/mediawiki-configmaster+2 -2
operations/puppetproduction+3 -3
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+2 -2
operations/puppetproduction+5 -0
operations/puppetproduction+2 -2
operations/puppetproduction+4 -4
operations/deployment-chartsmaster+4 -0
operations/deployment-chartsmaster+0 -2
operations/deployment-chartsmaster+0 -4
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+5 -5
operations/deployment-chartsmaster+5 -5
operations/deployment-chartsmaster+6 -0
operations/puppetproduction+6 -6
operations/dnsmaster+6 -6
operations/puppetproduction+24 -0
operations/puppetproduction+8 -0
operations/dnsmaster+6 -0
Show related patches Customize query in gerrit
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Updated new domain for Test Kitchen APIrepos/product-analytics/experimentation-lab/experiment-analytics-configs!32sfaciT407805-rename-mpic.wikimedia.orgmain
Renamed the default experiments config URI with the new domainrepos/data-engineering/airflow-dags!1871sfaciT407805-rename-mpic.wikimedia.orgmain
deployment-calendar: Rename deployment windows related to Test Kitchenrepos/releng/release!223sfaciT407805-rename-mpicmain
A few renaming for the more visible things to users: alpha status, logo and links to documentationrepos/data-engineering/mpic!275sfaciT407805-rename-mpicmain
Customize query in GitLab

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1212418 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] testkicthen: allow reaching out to the mpic app via testkitchen.w.o

https://gerrit.wikimedia.org/r/1212418

Change #1212419 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] testkitchen: add the additional testkitchen.w.o domain to the ingress gateway hosts

https://gerrit.wikimedia.org/r/1212419

Change #1212427 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] Define the testkitchen kubeconfigs

https://gerrit.wikimedia.org/r/1212427

Change #1212420 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] testkitchen-next: set the OIDC callback URL doimain to testkitchen-next.w.o

https://gerrit.wikimedia.org/r/1212420

Change #1212428 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] Define the testkitchen services

https://gerrit.wikimedia.org/r/1212428

Change #1212421 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] testkitchen: set the OIDC callback URL domain to testkitchen.w.o

https://gerrit.wikimedia.org/r/1212421

Change #1212430 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] testkitchen: allow public access from the internet

https://gerrit.wikimedia.org/r/1212430

Change #1212422 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] Rename mpic-next service to testkitchen-next

https://gerrit.wikimedia.org/r/1212422

Change #1212432 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] testkitchen: drop mpic.w.o from OIDC service

https://gerrit.wikimedia.org/r/1212432

Change #1212423 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] Rename mpic service to testkitchen

https://gerrit.wikimedia.org/r/1212423

Change #1212433 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] testkitchen: rename the OIDC services

https://gerrit.wikimedia.org/r/1212433

Change #1212424 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] testkitchen: drop the mpic.w.o SANs from the certificate

https://gerrit.wikimedia.org/r/1212424

Change #1212434 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] mpic: delete kubeconfigs

https://gerrit.wikimedia.org/r/1212434

Change #1212425 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] testkitchen: drop the mpic.w.o domains from the ingress gateways

https://gerrit.wikimedia.org/r/1212425

Change #1212435 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] Move mpic service mesh entry to testkitchen

https://gerrit.wikimedia.org/r/1212435

Change #1212426 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] testkitchen: rename the OIDC services

https://gerrit.wikimedia.org/r/1212426

Change #1212436 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] mpic: delete services from service list

https://gerrit.wikimedia.org/r/1212436

So, we can rename the public one and tweak wmftkbot to point to the new public domain and we probably don't need more work for other callers except if we also want to modify the internal domain and the related local service to rename them to `testkitchen

As per slack conversation we agreed on having the app configured to respond to both (mpic/mpic-next and testkitchen/kitchen-next) public and discovery domains. Once that done, we will be able to make all the callers point to the new ones (kitchen, kitchen-next) and, after that, we will remove the old ones (mpic, mpic-next)

Change #1212437 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] testkitchen: reconfigure the OIDC service ids to support 2 domains

https://gerrit.wikimedia.org/r/1212437

Change #1212438 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] testkitchen-next: drop mpic-next.w.o from OIDC service

https://gerrit.wikimedia.org/r/1212438

Change #1212400 merged by Brouberol:

[operations/dns@master] Define the testkitchen discovery and public recoords

https://gerrit.wikimedia.org/r/1212400

Change #1212427 merged by Brouberol:

[operations/puppet@production] Define the testkitchen kubeconfigs

https://gerrit.wikimedia.org/r/1212427

Change #1212428 merged by Brouberol:

[operations/puppet@production] Define the testkitchen services

https://gerrit.wikimedia.org/r/1212428

While starting to work on that we ended up discussing whether the new name should be test-kitchen instead of testkitchen. We will waiting until next week to have the opportunity to talk about it as a team so all the patches above will be paused until we take a clear decision.

In the meantime I have been started to discover how callers talk with mpic for when we have to make the corresponding changes to point to the new public/internal domain

Change #1213271 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/dns@master] Rename testkitchen domains to test-kitchen

https://gerrit.wikimedia.org/r/1213271

Change #1213274 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] Define the test-kitchen kubeconfigs

https://gerrit.wikimedia.org/r/1213274

Change #1213275 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] Define the test-kitchen services

https://gerrit.wikimedia.org/r/1213275

Change #1213271 merged by Brouberol:

[operations/dns@master] Rename testkitchen domains to test-kitchen

https://gerrit.wikimedia.org/r/1213271

Change #1213274 merged by Brouberol:

[operations/puppet@production] Define the test-kitchen kubeconfigs

https://gerrit.wikimedia.org/r/1213274

Change #1212418 merged by Brouberol:

[operations/deployment-charts@master] test-kitchen: allow reaching out to the mpic app via test-kitchen.w.o

https://gerrit.wikimedia.org/r/1212418

Change #1212419 merged by Brouberol:

[operations/deployment-charts@master] test-kitchen: add the additional test-kitchen.w.o domain to the ingress gateway hosts

https://gerrit.wikimedia.org/r/1212419

Change #1213275 merged by Brouberol:

[operations/puppet@production] Define the test-kitchen services

https://gerrit.wikimedia.org/r/1213275

Change #1212437 merged by Brouberol:

[operations/puppet@production] test-kitchen-next: reconfigure the OIDC service ids to support 2 domains

https://gerrit.wikimedia.org/r/1212437

Change #1212430 merged by Brouberol:

[operations/puppet@production] test-kitchen-next: allow public access from the internet

https://gerrit.wikimedia.org/r/1212430

Change #1212420 merged by Brouberol:

[operations/deployment-charts@master] test-kitchen-next: set the OIDC callback URL domain to test-kitchen-next.w.o

https://gerrit.wikimedia.org/r/1212420

sfaci opened https://gitlab.wikimedia.org/repos/data-engineering/mpic/-/merge_requests/275

A few renaming for the more visible things to users: alpha status, logo and links to documentation

Change #1213489 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] Redirect mpic-next.w.o to test-kitchen-next.w.o

https://gerrit.wikimedia.org/r/1213489

Change #1213489 merged by Brouberol:

[operations/puppet@production] Redirect mpic-next.w.o to test-kitchen-next.w.o

https://gerrit.wikimedia.org/r/1213489

Change #1213505 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] Redirect mpic.w.o to test-kitchen.w.o

https://gerrit.wikimedia.org/r/1213505

@mpopov @dr0ptp4kt As part of this work we are considering to enable the new domain, test-kitchen.wikimedia.org, and enable a redirection to there from the old one (mpic.wikimedia.org -> test-kitchen.wikimedia.org) to provide a smooth transition for users and callers. We have found that wmftkbot and experiment-analytics-config (and its related DAG) codebases use the public domain to get instruments/experiments configurations. Can you confirm whether they would follow the redirection if we enable it? The plan would be something like the following:

  • Enable the new domain (test-kitchen.wikimedia.org)
  • Enable the redirection (mpic.wikimedia.org -> test-kitchen.wikimedia.org). At this point everything would work regardless of users/callers use the old or the new domain (as long as they support the redirection)
  • Update callers to use the new domain

Another related question would be: Could the components mentioned above use the internal discovery domain we have? There is now a mpic.discovery.wmnet domain (test-kitchen.discovery.wmnet for the new one) that other components like Varnish uses instead of the public one (other components consume the API as a local service). Can wmftkbot and experiment-analytics-config use it? I don't know if they can reach it

In the case the redirection is a blocker for this plan, there is a Plan B where we would enable the new domain without it, we would update these callers to use the new domain and, after that, we would enable that redirection for users to have a smooth transition regarding the UI part. Also wondering if, in the end, these changes are easy enough to consider this as the plan A. Already preparing the corresponding MRs just in case that's the case.

Note: Test Kitchen UI has a callback URL configured to make the CAS login mechanism work. That's why the redirection allow us to provide a smooth transition regarding the login/UI part. Without it, only one domain (the old or the new one) can work (the one we configure as that callback URL). The API part would work anyway (with or without the redirection).

sfaci opened https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/223

Renamed name and description of deployment windows related to Test Kitchen

mpic.discovery.wmnet is showing as non-reachable for ping as well as for a request such as, for example, https://mpic.discovery.wmnet:30443/api/v1/experiments?authority=varnish&format=config, from the tool bastion under tool user tk (the ID used for wmftkbot). Also, it appears if trying to run it via the k8s infrastructure that the jobs run from, same unreachability issue.

$ toolforge jobs run check-internal-ping --command "python3 -c \"import socket; print('open' if (lambda s: (s.connect(('mpic.discovery.wmnet', 30443)) or True))(socket.socket()) else 'closed')\"" --image python3.13 --wait

$ cat check-internal-ping.err
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    import socket; print('open' if (lambda s: (s.connect(('mpic.discovery.wmnet', 30443)) or True))(socket.socket()) else 'closed')
                                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "<string>", line 1, in <lambda>
    import socket; print('open' if (lambda s: (s.connect(('mpic.discovery.wmnet', 30443)) or True))(socket.socket()) else 'closed')
                                               ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: [Errno 110] Connection timed out

But, the Python script is using requests.get, so it should in theory handle the redirect seamlessly (if it doesn't, it should just be logging to the log file that it's having problems; this is on a sleep cycle so it shouldn't add too much to the log file in case it acts surprisingly).

I've added an item to T411035: wmftkbot maintenance release to point to the new domain (once it's in place and routing requests) just to save the little bit of overhead of redirection, nonetheless!

jforrester merged https://gitlab.wikimedia.org/repos/releng/release/-/merge_requests/223

deployment-calendar: Rename deployment windows related to Test Kitchen

Change #1214585 had a related patch set uploaded (by Santiago Faci; author: Santiago Faci):

[operations/mediawiki-config@master] LabsService: Rename mpic-next domain

https://gerrit.wikimedia.org/r/1214585

Change #1214615 had a related patch set uploaded (by Santiago Faci; author: Santiago Faci):

[operations/mediawiki-config@master] wmgLocalServices: Renamed `mpic` to `test-kitchen` local service

https://gerrit.wikimedia.org/r/1214615

As per slack conversation we agreed on the following plan to address this renaming work:

  • Enable the new domain kest-kitchen.wikimedia.org
  • Do nothing regarding the redirection (users won't be able to use yet the new domain but they will still be able to use it as mpic.wikimedia.org)
  • Update the callers that use the public domain (wmftkbot and experiment-analytics-config). Callers can use the new domain because the API doesn't use the login mechanism
  • Make the redirection mpic.wikimedia.org -> test-kitchen.wikimedia.org (users will start to be able to use the UI via both domains)
  • Keep updating the rest of the callers (they don't use the public domain, they reach the API via internal discovery domain or local service)

Regarding the staging environment some work was already done at this time, test-kitchen-next.wikimedia.org has been already enabled and mpic-next.wikimedia.org is already pointing to there. That means that, regarding the staging environment, the new domain is fully working (login/UI + API). That allow us to work on some other related changes (see T407808: Rename xLab to test kitchen) and testing them in the staging environment while enabling new domain and updating callers in the production one.

Change #1214615 abandoned by Santiago Faci:

[operations/mediawiki-config@master] wmgLocalServices: Renamed `mpic` to `test-kitchen` local service

Reason:

I'll do this change along with the corresponding one for LabsServices

https://gerrit.wikimedia.org/r/1214615

cjming merged https://gitlab.wikimedia.org/repos/data-engineering/mpic/-/merge_requests/275

A few renaming for the more visible things to users: alpha status, logo and links to documentation

Change #1213505 abandoned by Brouberol:

[operations/puppet@production] Redirect mpic.w.o to test-kitchen.w.o

Reason:

No longer requires, according to @Santi

https://gerrit.wikimedia.org/r/1213505

@dr0ptp4kt: Regarding the following

I've added an item to T411035: wmftkbot maintenance release to point to the new domain (once it's in place and routing requests) just to save the little bit of overhead of redirection, nonetheless!

The new domain, test-kitchen.wikimedia.org is available already. wmftkbot can point to it to make API requests (the login/UI thing will be working only at mpic.wikimedia.org until we update all the callers)

Thanks @Sfaci. wmftkbot is now pointed at test-ktichen.wikimedia.org via its local config.yaml and a restart of the tk tool's continuous job on Toolforge, and it looks to be in working order.

Change #1217244 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[operations/deployment-charts@master] Test Kitchen UI: Deploying v1.1.4 release to staging

https://gerrit.wikimedia.org/r/1217244

Thanks @Sfaci. wmftkbot is now pointed at test-ktichen.wikimedia.org via its local config.yaml and a restart of the tk tool's continuous job on Toolforge, and it looks to be in working order.

Cool! Thanks Adam! Do you mind if I update the toolforge branch accordingly?

Change #1217246 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[operations/deployment-charts@master] Test Kitchen UI: Deploying v1.1.4 release to staging Test Kitchen UI: Deploying v1.1.4 release to production

https://gerrit.wikimedia.org/r/1217246

Thanks @Sfaci. wmftkbot is now pointed at test-ktichen.wikimedia.org via its local config.yaml and a restart of the tk tool's continuous job on Toolforge, and it looks to be in working order.

Cool! Thanks Adam! Do you mind if I update the toolforge branch accordingly?

No, not at all, feel free to modify as one of the items with Bug: T411035 on that branch. I don't imagine being able to attend to that ticket until Friday afternoon or maybe sometime next week...possibly post the break (I'll be out two weeks). Although probably anyone could take it. Anyway, a little touchup is good! Thank you!

Change #1217244 merged by jenkins-bot:

[operations/deployment-charts@master] Test Kitchen UI: Deploying v1.1.4 release to staging

https://gerrit.wikimedia.org/r/1217244

Change #1217246 merged by jenkins-bot:

[operations/deployment-charts@master] Test Kitchen UI: Deploying v1.1.4 release to production

https://gerrit.wikimedia.org/r/1217246

Change #1217487 had a related patch set uploaded (by Santiago Faci; author: Santiago Faci):

[operations/puppet@production] wmfuniq_experiment_fetcher: Update TestKitchen API domain

https://gerrit.wikimedia.org/r/1217487

Change #1217487 merged by Vgutierrez:

[operations/puppet@production] wmfuniq_experiment_fetcher: Update TestKitchen API domain

https://gerrit.wikimedia.org/r/1217487

After confirming with Balthazar, I have removed a couple of unneeded ACs from the "Update callers" item:

Those kind of changes can be considered as "internal machinery" and we don't need to update it for callers to be able to call test-kitchen. Callers that use the service as "local" do it via localhost:6037, not via the name of the service.

Change #1214585 abandoned by Santiago Faci:

[operations/mediawiki-config@master] Rename `mpic` local service to `test-kitchen` because of the platform renaming

Reason:

Abandoning in favour of https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1217360 which is a patch that will cover the changes done here and some other we also need

https://gerrit.wikimedia.org/r/1214585

Change #1217360 had a related patch set uploaded (by Santiago Faci; author: Clare Ming):

[operations/mediawiki-config@master] Deploy TestKitchen to Beta Cluster

https://gerrit.wikimedia.org/r/1217360