Page MenuHomePhabricator

move micro site annual.wikimedia.org and 15.wikipedia.org to kubernetes
Closed, ResolvedPublic

Description

Tracking migration of all microsites in T300171 gets a bit confusing. So opening a subtask for annualreport microsite (even if most of the work is done).

Migration checklist (see complete steps here):

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Jelto triaged this task as Medium priority.May 19 2023, 2:08 PM
Jelto moved this task from Incoming to Work in Progress on the collaboration-services board.

Change 922473 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/deployment-charts@master] miscweb: disableDefaultHosts in ingress

https://gerrit.wikimedia.org/r/922473

Change 922473 merged by jenkins-bot:

[operations/deployment-charts@master] miscweb: disableDefaultHosts in ingress

https://gerrit.wikimedia.org/r/922473

Change 922500 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] service: add host to miscweb http probe

https://gerrit.wikimedia.org/r/922500

Change 922500 merged by Jelto:

[operations/puppet@production] service: add host to miscweb http probe

https://gerrit.wikimedia.org/r/922500

The Kubernetes ingress configuration needed some debugging, as we are the first service using multiple ingress configs in one namespace. Thanks to @JMeybohm for help troubleshooting this!

With the newest fix deployed, static-bugzilla, annual and 15.wikipedia.org work fine now:

curl -I --resolve static-bugzilla.wikimedia.org:30443:$(dig +short k8s-ingress-wikikube.svc.eqiad.wmnet) https://static-bugzilla.wikimedia.org:30443
HTTP/2 200 
date: Tue, 23 May 2023 11:57:16 GMT
content-location: index.html.gz
content-encoding: gzip
content-type: text/html

curl -I --resolve annual.wikimedia.org:30443:$(dig +short k8s-ingress-wikikube.svc.eqiad.wmnet) https://annual.wikimedia.org:30443
HTTP/2 302 
date: Tue, 23 May 2023 11:57:21 GMT
location: https://wikimediafoundation.org/about/annualreport/current/
content-type: text/html; charset=iso-8859-1

curl -I --resolve 15.wikipedia.org:30443:$(dig +short k8s-ingress-wikikube.svc.eqiad.wmnet) https://15.wikipedia.org:30443
HTTP/2 200 
date: Tue, 23 May 2023 11:57:26 GMT
content-length: 23636
content-type: text/html

So the next step would be to switch the service to the new backend (https://gerrit.wikimedia.org/r/c/operations/puppet/+/761062 and a additional change for annual.wikimedia.org).

Change 761062 had a related patch set uploaded (by Jelto; author: Dzahn):

[operations/puppet@production] trafficserver: switch 15.wikipedia.org backend

https://gerrit.wikimedia.org/r/761062

Change 761062 merged by Dzahn:

[operations/puppet@production] trafficserver: switch 15.wikipedia.org backend

https://gerrit.wikimedia.org/r/761062

Change 922791 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] trafficserver: switch annual.wikimedia.org backend

https://gerrit.wikimedia.org/r/922791

Thanks @Dzahn for switching the traffic for 15.wikipedia.org to the new kubernetes backend! I also done some tests and my requests are hitting the Kubernetes pods and not the legacy miscweb vms.

See also https://grafana.wikimedia.org/d/b1jttnFMz/envoy-telemetry-k8s?orgId=1&var-datasource=thanos&var-site=eqiad&var-prometheus=k8s&var-app=miscweb&var-destination=All&from=1684908465819&to=1684909987347&viewPanel=4 (we may need release specific dashboard here).

I think we can continue with switching the traffic for annual.wikimedia.org too. I prepared the above change. I'd be ok to do the deploy asynchronously too.

Change 922791 merged by Jelto:

[operations/puppet@production] trafficserver: switch annual.wikimedia.org backend

https://gerrit.wikimedia.org/r/922791

Annual service was switched to Kubernetes. Last traffic hits on the legacy VM was 09:08:28 UTC.

We can continue with the cleanup. Similar to 15.wikipedia.org the blackbox monitor will fail if we remove the site from the old miscweb VM. I'll try to find a solution to have the blackbox monitor configured somewhere else for the service (maybe in service::catalog, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/923263).

There is a solution for pointing the blackbox check to the new Kubernetes Service:

  • set port => 30443 (the ingress Kubernetes port)
  • set ip4 => ipresolve('miscweb.discovery.wmnet', 4)

Added in change https://gerrit.wikimedia.org/r/c/operations/puppet/+/923342.

Logs report:

target=https://[10.2.2.70]:30443/ msg="Probe succeeded" duration_seconds=0.104886251
target=https://[10.2.2.70]:30443/ msg="Received HTTP response" status_code=200
target=https://[10.2.2.70]:30443/ msg="Resolved target address" ip=10.2.2.70
target=https://[10.2.2.70]:30443/ msg="Making HTTP request" url=https://10.2.2.70:30443/

So we should be good to do the remaining cleanup.

Change 924888 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] microsites: move blackbox checks to dedicated monitoring profile

https://gerrit.wikimedia.org/r/924888

Change 924888 merged by Jelto:

[operations/puppet@production] microsites: move blackbox checks to dedicated monitoring profile

https://gerrit.wikimedia.org/r/924888

Jelto reassigned this task from Jelto to Dzahn.EditedMay 31 2023, 9:16 AM

Cleanup of puppet code done.

The last part is archive the gerrit project https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia/annualreport/.

@Dzahn can you archive the project and put a [ARCHIVED], moved to https://gitlab.wikimedia.org/repos/sre/miscweb/annualreport/ in the project description?

Readiness probes for annual services throws a warning because of http 302 response code.

Readiness probe warning: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://wikimediafoundation.org/about/annualreport/current/">here</a>.</p>
</body></html>

https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-k8s-1-7.0.0-1-2023.06.01?id=O5vHdogBs53OSt3daRMS

I'll add patch to point the readiness probe to a path which returns a 200 endpoint. thanks @JMeybohm for spotting this!

Change 925751 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/deployment-charts@master] miscweb: change path for probes

https://gerrit.wikimedia.org/r/925751

Change 925751 merged by jenkins-bot:

[operations/deployment-charts@master] miscweb: change path for readiness probe

https://gerrit.wikimedia.org/r/925751

ACK, I will take care of the archiving / Gerrit readme soon.

Mentioned in SAL (#wikimedia-operations) [2023-06-01T16:06:00Z] <mutante> gerrit - set repo wikimedia/annualreport to readonly (from active) - T337041

@Dzahn can you archive the project and put a [ARCHIVED], moved to https://gitlab.wikimedia.org/repos/sre/miscweb/annualreport/ in the project description?

DONE!

Dzahn updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2023-06-21T18:09:54Z] <mutante> miscweb1003/miscweb2003 - rm -rf /srv/org/wikimedia/annualreport T337041