Page MenuHomePhabricator

jijiki (effie mouzeli)
is an animal

Projects (11)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Aug 14 2018, 10:50 AM (202 w, 2 d)
Availability
Available
IRC Nick
effie
LDAP User
Effie Mouzeli
MediaWiki User
EMouzeli (WMF) [ Global Accounts ]

Recent Activity

Nov 10 2021

Dzahn awarded T235425: webperf*002 running out of disk space (arc lamp, xhgui) a Grey Medal token.
Nov 10 2021, 11:12 PM · Arc-Lamp, serviceops, SRE, Performance-Team

Oct 27 2021

jijiki closed T280497: Benchmark performance of MediaWiki on k8s as Resolved.

Production URL testing (1.929.416 URLs) results in https://people.wikimedia.org/~akosiaris/prod_urls/. Findings for c=20, c=30, c=40 are consistent with what we have seen so far

Oct 27 2021, 8:55 PM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE
jijiki closed T280497: Benchmark performance of MediaWiki on k8s, a subtask of T290536: Serve production traffic via Kubernetes, as Resolved.
Oct 27 2021, 8:55 PM · Performance-Team (Radar), Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
jijiki updated the task description for T293630: Investigate performance degradation at high concurrencies in php-fpm .
Oct 27 2021, 9:53 AM · serviceops, Performance-Team

Oct 26 2021

jijiki added a comment to T280497: Benchmark performance of MediaWiki on k8s.

Parsoid testing, original images can be found at https://people.wikimedia.org/~jiji/benchmarks-parsoid/, our findings are similar to our previous tests. Baremetal performs better at low concurrencies, while k8s performs better at c=15 and up, while its >p90 is not always great.

c10.jpg (1×7 px, 544 KB)

Oct 26 2021, 11:14 AM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE
jijiki updated the task description for T280497: Benchmark performance of MediaWiki on k8s.
Oct 26 2021, 7:08 AM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE

Oct 19 2021

jijiki added a comment to T271736: Migrate WMF production from PHP 7.2 to PHP 7.4.

There's no reason for T263437 to be a sub task? It's unrelated work and only needed when we move to a new OS (with a new ICU), but not when we merely migrate to a new PHP release.

Oct 19 2021, 6:38 PM · Performance-Team (Radar), serviceops
jijiki added a comment to T290536: Serve production traffic via Kubernetes.
Oct 19 2021, 3:28 PM · Performance-Team (Radar), Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s

Oct 18 2021

jijiki updated the task description for T293216: Upgrade mc* and mc-gp* hosts to Debian Bullseye.
Oct 18 2021, 4:23 PM · serviceops
jijiki updated the task description for T293216: Upgrade mc* and mc-gp* hosts to Debian Bullseye.
Oct 18 2021, 4:23 PM · serviceops
jijiki added a subtask for T280497: Benchmark performance of MediaWiki on k8s: T293630: Investigate performance degradation at high concurrencies in php-fpm .
Oct 18 2021, 2:09 PM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE
jijiki added a parent task for T293630: Investigate performance degradation at high concurrencies in php-fpm : T280497: Benchmark performance of MediaWiki on k8s.
Oct 18 2021, 2:09 PM · serviceops, Performance-Team
jijiki created T293630: Investigate performance degradation at high concurrencies in php-fpm .
Oct 18 2021, 2:08 PM · serviceops, Performance-Team
jijiki awarded T263437: Allow easier ICU transitions in MediaWiki (change how sortkey collation is managed in the categorylinks table) a Baby Tequila token.
Oct 18 2021, 1:40 PM · Performance-Team, MW-1.38-notes (1.38.0-wmf.19; 2022-01-24), Patch-For-Review, Platform Engineering Code Jam, Platform Engineering Roadmap Decision Making, MediaWiki-General, SRE
jijiki added a parent task for T263437: Allow easier ICU transitions in MediaWiki (change how sortkey collation is managed in the categorylinks table): T271736: Migrate WMF production from PHP 7.2 to PHP 7.4.
Oct 18 2021, 1:38 PM · Performance-Team, MW-1.38-notes (1.38.0-wmf.19; 2022-01-24), Patch-For-Review, Platform Engineering Code Jam, Platform Engineering Roadmap Decision Making, MediaWiki-General, SRE
jijiki added a subtask for T271736: Migrate WMF production from PHP 7.2 to PHP 7.4: T263437: Allow easier ICU transitions in MediaWiki (change how sortkey collation is managed in the categorylinks table).
Oct 18 2021, 1:38 PM · Performance-Team (Radar), serviceops
jijiki added a comment to T293451: Allow sending traffic to php 7.2 or 7.4 selectively in the apache configuration for MediaWiki.

Which brings again our usual issue, to cache slot or not to cache slot? If we don't want to slot the caches, we could consider following what we are planning to do in T290536

Oct 18 2021, 1:37 PM · serviceops

Oct 16 2021

jijiki renamed T258779: Roll out remote-DC gutter pool for /*/mw-wan/ from Roll out remote gutter pool to Roll out remote-DC gutter pool for /*/mw-wan/.
Oct 16 2021, 5:24 AM · Patch-For-Review, User-jijiki, serviceops

Oct 15 2021

jijiki closed T227265: mcrouter codfw proxies sometimes lead to TKOs as Declined.

We are not using proxies anymore, but some TKOs we see every now and then could be related to T291385, not much we can do, closing.

Oct 15 2021, 2:42 PM · Performance-Team (Radar), User-Elukey, serviceops, SRE
jijiki closed T227265: mcrouter codfw proxies sometimes lead to TKOs, a subtask of T244852: Upgrade and improve our application object caching service (memcached), as Declined.
Oct 15 2021, 2:41 PM · Patch-For-Review, SRE, serviceops
jijiki closed T245841: mcrouter proxies and scap proxies as Invalid.

Since we have no mcrouter proxies, and we won't have any scap proxies in the future, closing.

Oct 15 2021, 2:01 PM · SRE, serviceops
jijiki renamed T258779: Roll out remote-DC gutter pool for /*/mw-wan/ from Roll out proxy gutter pool to Roll out remote gutter pool.
Oct 15 2021, 5:32 AM · Patch-For-Review, User-jijiki, serviceops

Oct 14 2021

jijiki updated the task description for T293012: Productionise mc20[38-55].
Oct 14 2021, 4:21 PM · Patch-For-Review, serviceops

Oct 13 2021

jijiki created T293216: Upgrade mc* and mc-gp* hosts to Debian Bullseye.
Oct 13 2021, 11:52 AM · serviceops
jijiki updated subscribers of T210580: Write puppet for redis-sentinel.

@Ladsgroup is this work still in progress or abandoned?

Oct 13 2021, 11:41 AM · Infrastructure-Foundations, Machine-Learning-Team, Puppet, ORES
jijiki added a project to T293063: Write and adapt Runbooks and cookbooks related to the WDQS Streaming Updater and kubernetes: Prod-Kubernetes.
Oct 13 2021, 10:31 AM · Discovery-Search (Current work), Prod-Kubernetes, wdwb-tech, serviceops, Wikidata, Wikidata-Query-Service
jijiki renamed T293063: Write and adapt Runbooks and cookbooks related to the WDQS Streaming Updater and kubernetes from Write and adapt Runbooks related to the WDQS Streaming Updater and kubernetes to Write and adapt Runbooks and cookbooks related to the WDQS Streaming Updater and kubernetes.
Oct 13 2021, 10:30 AM · Discovery-Search (Current work), Prod-Kubernetes, wdwb-tech, serviceops, Wikidata, Wikidata-Query-Service

Oct 12 2021

jijiki added a comment to T292646: Reduce latency of new Scap releases.

@Legoktm we may debdeploy scap everywhere, and then for whatever reason we need to push change Y fast due to issue X. If scap fails everywhere because of a bug we missed, we have a problem where we first need to downgrade scap, and then rerun it. In my opinion, we should keep having scap sit on the canaries for 1 day, and save us from a potential scenario like this. To my knowledge, scap's test coverage is rather low (I admit I have not read scap code for quite some time). If this is still the case, gives us one more reason to want to be a little bit more careful with its rollout.

Oct 12 2021, 7:13 PM · serviceops, Release-Engineering-Team (Doing), SRE
jijiki updated subscribers of T292646: Reduce latency of new Scap releases.

@Legoktm is working on a cookbook to speed up packaging of scap https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/727605. The rollout process has to stay as it is though (upgrade on canaries first, and roll out to all hosts after 1-2 days)

Oct 12 2021, 10:45 AM · serviceops, Release-Engineering-Team (Doing), SRE

Oct 11 2021

jijiki added a subtask for T293012: Productionise mc20[38-55]: Unknown Object (Task).
Oct 11 2021, 4:08 PM · Patch-For-Review, serviceops
jijiki created T293012: Productionise mc20[38-55].
Oct 11 2021, 4:08 PM · Patch-For-Review, serviceops

Oct 8 2021

jijiki updated the task description for T292390: Upgrade all deployment charts to use the latest version of common_templates.
Oct 8 2021, 2:04 PM · Patch-For-Review, good first task, SRE, serviceops
jijiki added a comment to T280497: Benchmark performance of MediaWiki on k8s.

After the last tuning (APCu + memory limits), the results were more promising:

Oct 8 2021, 11:41 AM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE
jijiki updated the task description for T280497: Benchmark performance of MediaWiki on k8s.
Oct 8 2021, 11:21 AM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE
jijiki added a comment to T282148: Support Canary releases on Kubernetes .

I think what we are missing here is how to get prometheus metrics strictly for the canary deployment. I confess I have not dug deeper into this.

Oct 8 2021, 9:27 AM · serviceops

Oct 7 2021

jijiki added a subtask for T290536: Serve production traffic via Kubernetes: T292707: Migrate Wikitech to Kubernetes.
Oct 7 2021, 11:10 AM · Performance-Team (Radar), Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
jijiki added a parent task for T292707: Migrate Wikitech to Kubernetes: T290536: Serve production traffic via Kubernetes.
Oct 7 2021, 11:10 AM · wikitech.wikimedia.org, MW-on-K8s, serviceops
jijiki renamed T292707: Migrate Wikitech to Kubernetes from Move Wikitech to Kubernetes to Migrate Wikitech to Kubernetes.
Oct 7 2021, 11:10 AM · wikitech.wikimedia.org, MW-on-K8s, serviceops
jijiki updated the task description for T290536: Serve production traffic via Kubernetes.
Oct 7 2021, 11:05 AM · Performance-Team (Radar), Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
jijiki created T292707: Migrate Wikitech to Kubernetes.
Oct 7 2021, 11:00 AM · wikitech.wikimedia.org, MW-on-K8s, serviceops
jijiki added a parent task for T280497: Benchmark performance of MediaWiki on k8s: T290536: Serve production traffic via Kubernetes.
Oct 7 2021, 10:20 AM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE
jijiki added a parent task for T288848: Make HTTP calls work within mediawiki on kubernetes : T290536: Serve production traffic via Kubernetes.
Oct 7 2021, 10:20 AM · MW-1.38-notes (1.38.0-wmf.12; 2021-12-06), Patch-For-Review, MW-1.37-notes (1.37.0-wmf.20; 2021-08-23), MW-on-K8s, serviceops, SRE
jijiki added subtasks for T290536: Serve production traffic via Kubernetes: T288848: Make HTTP calls work within mediawiki on kubernetes , T280497: Benchmark performance of MediaWiki on k8s.
Oct 7 2021, 10:20 AM · Performance-Team (Radar), Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
jijiki updated subscribers of T280497: Benchmark performance of MediaWiki on k8s.

Running some tests (c=60, ~1.9m URLs) agains mwdebug services, we found 2 issues:

Oct 7 2021, 9:45 AM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE
jijiki added a comment to T283159: Deploy tegola-vector-tiles to kubernetes.

Tegola is running on kubernetes, Maps mirrored 100% of production traffic where we had no SRE-swift-storage issues. 🎉

Oct 7 2021, 8:51 AM · Patch-For-Review, User-jijiki, serviceops, Maps
jijiki closed T283159: Deploy tegola-vector-tiles to kubernetes as Resolved.
Oct 7 2021, 8:49 AM · Patch-For-Review, User-jijiki, serviceops, Maps
jijiki closed T283159: Deploy tegola-vector-tiles to kubernetes, a subtask of T274390: New Service Request tegola-vector-tiles, as Resolved.
Oct 7 2021, 8:49 AM · serviceops, Maps, Product-Infrastructure-Team-Backlog, Service-deployment-requests, Services, SRE
jijiki created T292694: Create a dedicated tegola postgres user.
Oct 7 2021, 8:48 AM · serviceops, Maps

Oct 6 2021

jijiki closed T291095: Deploy Scap version 4.0.2 as Resolved.
Oct 6 2021, 5:23 AM · Release-Engineering-Team (Doing), serviceops, Scap

Oct 4 2021

jijiki added a comment to T291095: Deploy Scap version 4.0.2.

@dancy it would be lovely if we can speed this up, right now we have deploy1002 and maps* on version 3.17.1, and the rest on version 4.0.0.

Sorry @jijiki I was out sick yesterday so I lost a day getting things fixed up.

In the future if there's a problem w/ a new scap release, go ahead and roll back on all machines so we don't leave anybody blocked.

Oct 4 2021, 5:17 PM · Release-Engineering-Team (Doing), serviceops, Scap
jijiki added a comment to T280497: Benchmark performance of MediaWiki on k8s.

I run an initial test running some 1000s of production URLs. It appears that we are about to hit max_accelerated_files (currently is 7963x12 pods = 95556). Looking at the same value on our production servers, 16229 is a possible value to set before moving forward. We will see if we need to bump opcache too.

Oct 4 2021, 5:32 AM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE
jijiki added a comment to T280497: Benchmark performance of MediaWiki on k8s.

@Joe did so, thanks.

Oct 4 2021, 5:28 AM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE

Sep 30 2021

jijiki added a comment to T291095: Deploy Scap version 4.0.2.

@dancy it would be lovely if we can speed this up, right now we have deploy1002 and maps* on version 3.17.1, and the rest on version 4.0.0.

Sep 30 2021, 11:50 AM · Release-Engineering-Team (Doing), serviceops, Scap

Sep 29 2021

jijiki added a comment to T291918: Re-think how we separate traffic to mediawiki in clusters..

I forgot to add: we probably also want to migrate wikitech early in the process. It will need us to add php-ldap to our debug image, but it should allow us to dogfood the new installation early *and* to normalize as much as possible wikitech in the process.

Sep 29 2021, 2:30 PM · MW-on-K8s, SRE, serviceops
jijiki added a comment to T291918: Re-think how we separate traffic to mediawiki in clusters..

Naming things is hard though, I do not agree with the kube prefix, in the future after baremetal mediawiki servers are gone, it will be an prefix that does not mean much. Moreover, we probably want the discovery URLs to derive from the name of the services. I would propose:

The kube- prefix is noly needed now; it will not be needed once we've moved everything to kubernetes as we will be able to remove the conditionals from mediawiki-config that depend on the servergroup.
Of course, the servergroup is just a label that has a specific use, we can mostly ignore it.

Sep 29 2021, 2:28 PM · MW-on-K8s, SRE, serviceops
jijiki added a comment to T291918: Re-think how we separate traffic to mediawiki in clusters..

The first scenario I proposed in T290536 goes as follows:

  • One cluster for first deploy/debug purposes (kube-mwdebug)
  • One cluster to serve internal requests to the API (and possibly to wiki pages) (kube-api-internal)
  • One cluster to serve public API traffic (kube-api-external)
  • One cluster to serve the website, both mobile and desktop (kube-wikis)
  • One cluster for jobrunning (kube-jobrunner)
  • One cluster for videoscaling (if we can't move it to shellbox)

We could then split these clusters further between group0/group1/group2 wikis, or across database sections, but that would probably be done with a logical split at the kubernetes level (with say an ingress, or announcing services IPs) and not at an LVS level and represents a second layer of complication that we shouldn't get into right now.

Sep 29 2021, 12:27 PM · MW-on-K8s, SRE, serviceops
jijiki triaged T291990: Scap error when deploying kartotherian as High priority.
Sep 29 2021, 11:42 AM · serviceops, Scap
jijiki added a comment to T291990: Scap error when deploying kartotherian.

Just a heads up, this is currently blocking us from pushing a couple of changes to kartotherian to test our prod environments in k8s which is currently our main task.

Sep 29 2021, 11:42 AM · serviceops, Scap
jijiki updated the task description for T280767: Maps 2.0 roll-out plan.
Sep 29 2021, 11:36 AM · Patch-For-Review, User-jijiki, serviceops, Maps, Product-Infrastructure-Team-Backlog
jijiki added a comment to T290536: Serve production traffic via Kubernetes.

That's currently my preferred way cause it's deterministic (Got the cookie? go to k8s!).

We could also do the LVS dance of course. Add a number of kubernetes nodes to the clusters and start shifting weights. We 've done that in the past and it kinda works. The problem is that a) it's not deterministic b) the kinda nature of it, due to envoy persistent connections messing up the scheme.

If we wanna go the random not persistent way, there is also the choice of having ATS do the balancing, which would avoid the persistent connections issue, but as far as I know we don't currently have support for it in puppet. I doubt Traffic would be thrilled to add it, plus it's duplicating what we do with LVS.

Regarding "invalidate k8s rendered cache" refers to edge caches, sorry about this, I updated the description to avoid confusion.

Thanks. I think the answer is the same way we invalidate edge caches currently.

Sep 29 2021, 11:14 AM · Performance-Team (Radar), Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
jijiki renamed T276994: Provide an mwdebug functionality on kubernetes from Investigate how we can provide an mwdebug functionality on kubernetes to Provide an mwdebug functionality on kubernetes .
Sep 29 2021, 5:46 AM · Release-Engineering-Team (Next), serviceops, MW-on-K8s
jijiki closed T262202: Create a separate 'mwdebug' cluster as Resolved.
Sep 29 2021, 5:36 AM · Developer Productivity, WikimediaDebug, Performance-Team (Radar), Release-Engineering-Team (Radar), Analytics-Radar, observability, serviceops, User-jijiki
jijiki added a comment to T262202: Create a separate 'mwdebug' cluster.

@Krinkle I agree that we should come up with a complete solution for this. I will close this task and we can continue this discussion in T276994

Sep 29 2021, 5:34 AM · Developer Productivity, WikimediaDebug, Performance-Team (Radar), Release-Engineering-Team (Radar), Analytics-Radar, observability, serviceops, User-jijiki

Sep 28 2021

jijiki updated the task description for T290536: Serve production traffic via Kubernetes.
Sep 28 2021, 11:46 AM · Performance-Team (Radar), Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
jijiki added a comment to T290536: Serve production traffic via Kubernetes.

We could thus start with migrating the internal traffic first, starting with parsoid and the internal api traffic. It will be enough to change the pointer in envoy to api-internal-r{w,o}.discovery.wment to move each service to the new internal api cluster.

Ι like the idea of dogfooding, definitely api-internal-ro.discovery.wmnet is a good start. My concern is, if we have migrated services one by one, if, for any emergency reason, we want to temporarily switch them all back to api-r{w,o}, will take a considerable amount of time (redeploying every service using api-internal-ro.discovery.wmnet ). Please correct me if I am missing something

Likewise, we can progressively move external api traffic to api-external-rw.discovery.wmnet, and a fraction of the production traffic for the wikis to wiki-rw.discovery.wmnet, both clusters we'll build on kubernetes.

Similarly, we can split again jobrunners vs videoscalers functionally again. We are also no longer be limited in 1 instance of mwdebug (although exposing them might be a bit more involved) but as many as we want.

Sep 28 2021, 11:44 AM · Performance-Team (Radar), Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
jijiki created T291918: Re-think how we separate traffic to mediawiki in clusters..
Sep 28 2021, 11:23 AM · MW-on-K8s, SRE, serviceops
jijiki updated subscribers of T291095: Deploy Scap version 4.0.2.

@Ladsgroup run into this error:

Sep 28 2021, 10:57 AM · Release-Engineering-Team (Doing), serviceops, Scap

Sep 27 2021

jijiki closed T291052: Deploy PHP patch for DOM replaceChild/removeChild performance as Resolved.
Sep 27 2021, 6:51 AM · Patch-For-Review, SRE, serviceops

Sep 24 2021

jijiki updated the task description for T283159: Deploy tegola-vector-tiles to kubernetes.
Sep 24 2021, 2:32 PM · Patch-For-Review, User-jijiki, serviceops, Maps

Sep 22 2021

jijiki added a comment to T290536: Serve production traffic via Kubernetes.

I have some alternative ideas. Specifically, right now we have a limited number of different clusters, due to the complexity of correctly sizing such clusters on bare metal and the complications coming from the fact that switching clusters for a server basically meant a reimage.

Kubernetes removes most of such limitations, and I think we should move away from the current appserver/api split, to a more structured approach. This might also help with the migration.

First of all, I'd like to separate the traffic coming from internal requests to the mediawiki APIs from the external api traffic. This should allow us to easier sacrifice external api traffic when we're in an overload situation, while not sacrificing the internal traffic as well.

We could thus start with migrating the internal traffic first, starting with parsoid and the internal api traffic. It will be enough to change the pointer in envoy to api-internal-r{w,o}.discovery.wment to move each service to the new internal api cluster.

Sep 22 2021, 10:53 AM · Performance-Team (Radar), Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
jijiki added a comment to T280497: Benchmark performance of MediaWiki on k8s.

Thank you @ssastry, I updated the task descr to include them

Sep 22 2021, 8:34 AM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE
jijiki updated the task description for T280497: Benchmark performance of MediaWiki on k8s.
Sep 22 2021, 8:33 AM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE

Sep 21 2021

jijiki updated subscribers of T291095: Deploy Scap version 4.0.2.

Thank @dancy, I will try to get it done this week with @Arnoldokoth

Sep 21 2021, 1:58 PM · Release-Engineering-Team (Doing), serviceops, Scap
jijiki added a comment to T291385: TCP retransmissions in eqiad and codfw.
Sep 21 2021, 7:08 AM · serviceops, netops, Infrastructure-Foundations

Sep 20 2021

jijiki added a project to T291385: TCP retransmissions in eqiad and codfw: SRE.
Sep 20 2021, 2:09 PM · serviceops, netops, Infrastructure-Foundations
jijiki created T291385: TCP retransmissions in eqiad and codfw.
Sep 20 2021, 1:53 PM · serviceops, netops, Infrastructure-Foundations
jijiki updated subscribers of T280497: Benchmark performance of MediaWiki on k8s.

@ssastry we have done some benchmarks, but non of those were parsoid urls, it would great if you would provide a couple of parsoid URLs you'd like us to test

Sep 20 2021, 12:19 PM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE
jijiki updated the task description for T280497: Benchmark performance of MediaWiki on k8s.
Sep 20 2021, 12:08 PM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE
jijiki updated subscribers of T291052: Deploy PHP patch for DOM replaceChild/removeChild performance.
Sep 20 2021, 11:55 AM · Patch-For-Review, SRE, serviceops
jijiki added a comment to T291052: Deploy PHP patch for DOM replaceChild/removeChild performance.

We'll first roll out on our canaries and 5 parsoid servers, and continue with full roll out tomorrow.

Sep 20 2021, 11:49 AM · Patch-For-Review, SRE, serviceops
jijiki updated the task description for T283056: Create a mwdebug deployment for mediawiki on kubernetes.
Sep 20 2021, 5:25 AM · Patch-For-Review, User-jijiki, MW-on-K8s, serviceops, SRE

Sep 16 2021

jijiki updated subscribers of T290959: Phabricator failed to generate thumbnails for some 800-900KB files.
Sep 16 2021, 1:34 PM · Phabricator
jijiki added a comment to T290959: Phabricator failed to generate thumbnails for some 800-900KB files.

Same thing happened with 1,2 MB files, I think the problem is with thumbnails of files of certain size and up

Sep 16 2021, 1:33 PM · Phabricator
jijiki added a comment to T280497: Benchmark performance of MediaWiki on k8s.

Last set of benchmarks of Round 1, we added a run with 6 pods x 8 workers, no tideways installed:

Sep 16 2021, 1:30 PM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE
jijiki added a comment to T247697: Rethink mathoids SVG to PNG conversion.

Since Thumbor is being discussed here, I would like to point out a few things about Thumbor's situation and its infrastructure:

Sep 16 2021, 8:40 AM · Platform Engineering Roadmap Decision Making, Math, Wikimedia-SVG-rendering, User-Physikerwelt, Mathoid
jijiki updated the task description for T290536: Serve production traffic via Kubernetes.
Sep 16 2021, 5:16 AM · Performance-Team (Radar), Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s

Sep 15 2021

jijiki updated the task description for T290536: Serve production traffic via Kubernetes.
Sep 15 2021, 8:19 PM · Performance-Team (Radar), Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
jijiki added a project to T290536: Serve production traffic via Kubernetes: Performance-Team.
Sep 15 2021, 8:18 PM · Performance-Team (Radar), Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s

Sep 14 2021

jijiki updated the task description for T280497: Benchmark performance of MediaWiki on k8s.
Sep 14 2021, 1:59 PM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE
jijiki updated the task description for T280497: Benchmark performance of MediaWiki on k8s.
Sep 14 2021, 10:55 AM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE
jijiki created T290959: Phabricator failed to generate thumbnails for some 800-900KB files.
Sep 14 2021, 10:44 AM · Phabricator
jijiki added a comment to T280497: Benchmark performance of MediaWiki on k8s.

Last round of urls, same configuration, with the addition of a couple more requests: gerrit: 720061, where we set y=0. We get a better idea of how marginal differences are at low concurrencies in most workloads:

Sep 14 2021, 10:39 AM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE

Sep 13 2021

jijiki updated the task description for T280497: Benchmark performance of MediaWiki on k8s.
Sep 13 2021, 5:55 PM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE
jijiki added a comment to T280497: Benchmark performance of MediaWiki on k8s.

After round 1 fixes, we run another set of 10k requests with and without xhprof. Results can be found here: https://people.wikimedia.org/~jiji/benchmarks-round1-all/. We have got mixed results, as a general pattern I will go out on a limb and say that at low concurrencies (c<20) baremetal performs marginally better or similar to kubernetes, while at higher concurrencies (c > 20), kubernetes performs better.

Sep 13 2021, 5:53 PM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE
jijiki updated the task description for T280497: Benchmark performance of MediaWiki on k8s.
Sep 13 2021, 12:13 PM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE

Sep 12 2021

jijiki updated the task description for T280497: Benchmark performance of MediaWiki on k8s.
Sep 12 2021, 8:00 AM · Patch-For-Review, Performance-Team (Radar), MW-on-K8s, serviceops, SRE

Sep 9 2021

jijiki added a comment to T289657: Decommission mc[1019-1023,1025-1026,1028-1036].eqiad.wmnet.

@Cmjohnson You can now remove any of the remaining hosts any given time, thank you!

Sep 9 2021, 10:23 AM · SRE, ops-eqiad, decommission-hardware
jijiki updated subscribers of T281618: decommission mc1027.eqiad.wmnet.
Sep 9 2021, 9:24 AM · SRE, ops-eqiad, serviceops, decommission-hardware
jijiki assigned T281618: decommission mc1027.eqiad.wmnet to Cmjohnson.
Sep 9 2021, 9:23 AM · SRE, ops-eqiad, serviceops, decommission-hardware
jijiki updated the task description for T281618: decommission mc1027.eqiad.wmnet.
Sep 9 2021, 9:23 AM · SRE, ops-eqiad, serviceops, decommission-hardware