Page MenuHomePhabricator

Automated validation of mediawiki-multiversion images
Closed, ResolvedPublic

Description

After the build process creates the restricted mediawiki-multiversion image, we want automated tests to be run on it to perform validation before deployment.

Requirements:

  • A place to run the image which has access to production resources.
  • A set of tests to run. Using httpbb is a good starting point.

Event Timeline

dancy triaged this task as Medium priority.Aug 11 2021, 3:45 PM

A place to run the image which has access to production resources.

The staging cluster is available for this.

A set of tests to run. Using httpbb is a good starting point.

There's also a swagger spec I believe that we currently use, @Krinkle knows more details.

scap also invokes mwscript eval.php --wiki=enwiki to catch obvious fatals, see https://gerrit.wikimedia.org/g/mediawiki/tools/scap/+/7396e6f2adfa7b7f4bf12ed3fa2104f8c12d3355/scap/main.py#208

Change 721615 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] role::releases: Include ::profile::kubernetes::deployment_server

https://gerrit.wikimedia.org/r/721615

Change 721615 merged by Dzahn:

[operations/puppet@production] role::releases: Add profiles needed for image testing

https://gerrit.wikimedia.org/r/721615

Change 721860 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] releases: Install private data for mwdebug service

https://gerrit.wikimedia.org/r/721860

Change 721860 merged by Dzahn:

[operations/puppet@production] releases: Install private data for mwdebug service

https://gerrit.wikimedia.org/r/721860

Change 722910 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[labs/private@master] Add mwdebug secrets to releases servers

https://gerrit.wikimedia.org/r/722910

Change 722910 merged by Alexandros Kosiaris:

[labs/private@master] Add mwdebug secrets to releases servers

https://gerrit.wikimedia.org/r/722910

While refactoring kubernetes puppet code I came across the fact that we place credentials to the kubernetes staging cluster on ci and releases servers, my trail was:

I don't see helm defaults being installed to releases or ci nodes since the last change (Sept. 2021) which makes me wonder: Are the staging cluster credentials still required for both roles?

I don't see helm defaults being installed to releases or ci nodes since the last change (Sept. 2021) which makes me wonder: Are the staging cluster credentials still required for both roles?

role::releases: Creds no longer needed
ci::master: There are still some Gerrit repos that have a Jenkins CI pipeline which installs a helm chart to the staging cluster using KUBECONFIG=/etc/kubernetes/ci-staging.config. We (Release-Engineering-Team) are currently working on migrating repos of this type to Gitlab, but we'll still need the creds on the CI node in the meantime.

I don't see helm defaults being installed to releases or ci nodes since the last change (Sept. 2021) which makes me wonder: Are the staging cluster credentials still required for both roles?

role::releases: Creds no longer needed
ci::master: There are still some Gerrit repos that have a Jenkins CI pipeline which installs a helm chart to the staging cluster using KUBECONFIG=/etc/kubernetes/ci-staging.config. We (Release-Engineering-Team) are currently working on migrating repos of this type to Gitlab, but we'll still need the creds on the CI node in the meantime.

Thanks, that helps already!
Can you elaborate/point me to the discussion on how this is going to be implemented on GitLab runners? I'm planning to replace all static user tokens on the k8s clusters with client certificates from the PKI infrastructure. That will definitely interfere with how GitLab runners authenticate to the k8s API.

Change 912785 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] Remove profile::kubernetes::deployment_server from role::releases

https://gerrit.wikimedia.org/r/912785

Can you elaborate/point me to the discussion on how this is going to be implemented on GitLab runners? I'm planning to replace all static user tokens on the k8s clusters with client certificates from the PKI infrastructure. That will definitely interfere with how GitLab runners authenticate to the k8s API.

I planned to store the secret in Gitlab, associated with some repo that will have the code to do a test deployment to the staging cluster. So it sounds like we'll need a way to automatically update the secret whenever the certificate is updated.

That sounds like it would not be blocking me currently from migrating away from tokens (https://gerrit.wikimedia.org/r/c/operations/puppet/+/904500/12) as that is still in planning phase on your side, is my understanding correct?

That sounds like it would not be blocking me currently from migrating away from tokens (https://gerrit.wikimedia.org/r/c/operations/puppet/+/904500/12) as that is still in planning phase on your side, is my understanding correct?

Agreed, as long it still generates a working kubeconfig file of the same name.

Change 912785 merged by JMeybohm:

[operations/puppet@production] Remove profile::kubernetes::deployment_server from role::releases

https://gerrit.wikimedia.org/r/912785

Change #1111577 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] kubernetes: enable selecting clusters for deployment_server

https://gerrit.wikimedia.org/r/1111577

Change #1111577 merged by Filippo Giunchedi:

[operations/puppet@production] kubernetes: enable selecting clusters for deployment_server

https://gerrit.wikimedia.org/r/1111577

I stumbled upon this again recently and I think the current configuration does not allow pod creation at all with the credentials on the contint servers. Not sure exactly when those broke, but it makes me believe that the credentials are no longer used in CI (jenkins) and we can maybe finally remove them from contint servers (and maybe even remove the "ci" namespace from k8s clusters). @dancy | @dduvall wdyt?

I stumbled upon this again recently and I think the current configuration does not allow pod creation at all with the credentials on the contint servers. Not sure exactly when those broke, but it makes me believe that the credentials are no longer used in CI (jenkins) and we can maybe finally remove them from contint servers (and maybe even remove the "ci" namespace from k8s clusters). @dancy | @dduvall wdyt?

Using
CodeSearch I don't find any pipeline configuration which uses the deploy stage (which is what would actually run helm to do stuff in the ci namespace), so I have no objection to removing this cruft.

dancy claimed this task.
After the build process creates the restricted mediawiki-multiversion image, we want automated tests to be run on it to perform validation before deployment.

Requirements:

A place to run the image which has access to production resources.
A set of tests to run. Using httpbb is a good starting point.

These requirements are currently fulfilled by scap and the mwdebug servers during scap sync-world, therefore I'm considering this ticket to be resolved.

I stumbled upon this again recently and I think the current configuration does not allow pod creation at all with the credentials on the contint servers. Not sure exactly when those broke, but it makes me believe that the credentials are no longer used in CI (jenkins) and we can maybe finally remove them from contint servers (and maybe even remove the "ci" namespace from k8s clusters). @dancy | @dduvall wdyt?

Using
CodeSearch I don't find any pipeline configuration which uses the deploy stage (which is what would actually run helm to do stuff in the ci namespace), so I have no objection to removing this cruft.

Ok, cool. Let's keep this open then until the code has been removed

Change #1125119 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] Remove profile::kubernetes::* from role::ci

https://gerrit.wikimedia.org/r/1125119

Change #1125119 merged by Jelto:

[operations/puppet@production] Remove profile::kubernetes::* from role::ci

https://gerrit.wikimedia.org/r/1125119

Change #1126964 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] deployment_server: Remove special handling of ci user

https://gerrit.wikimedia.org/r/1126964

Change #1126964 merged by JMeybohm:

[operations/puppet@production] deployment_server: Remove special handling of ci user

https://gerrit.wikimedia.org/r/1126964

Change #1143802 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Remove ci namespace from wikikube staging clusters

https://gerrit.wikimedia.org/r/1143802

Change #1143802 merged by jenkins-bot:

[operations/deployment-charts@master] Remove ci namespace from wikikube staging clusters

https://gerrit.wikimedia.org/r/1143802

JMeybohm claimed this task.

All related changes have been reverted