Page MenuHomePhabricator

ori (Ori Livneh)
Senior Grepper

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 4:18 AM (491 w, 1 d)
Availability
Available
IRC Nick
ori
LDAP User
Ori
MediaWiki User
ATDT [ Global Accounts ]

Recent Activity

Feb 1 2024

ori added a comment to T310087: Advance declaration of query parameters.

In lieu of exporting a route map, MediaWiki could, as a first pass at the problem, emit a response header that signals to the CDN that a request contained garbage parameters. The CDN could use this information to throttle clients that issue too many such requests. This may be less desirable than filtering all such requests at the edge, but it is also simpler.

Feb 1 2024, 10:16 PM · SRE, Traffic, MediaWiki-General
hashar awarded T310087: Advance declaration of query parameters a Like token.
Feb 1 2024, 8:22 PM · SRE, Traffic, MediaWiki-General

Sep 29 2023

ori updated the task description for T347660: Portable performance test representative of Wikimedia's production environment.
Sep 29 2023, 4:14 AM · Wikimedia-Performance-recommendation, Performance Issue, MediaWiki-Core-Benchmarker
ori created T347660: Portable performance test representative of Wikimedia's production environment.
Sep 29 2023, 4:13 AM · Wikimedia-Performance-recommendation, Performance Issue, MediaWiki-Core-Benchmarker

Sep 5 2023

ori committed rECHB91af468256e0: Remove unnecessary targets definitions from extension.json (authored by ori).
Remove unnecessary targets definitions from extension.json
Sep 5 2023, 12:42 PM

Aug 10 2023

ori closed T341471: A 'cache-control' header contains directives with invalid values: 'stale-while-revalidate=60' as Invalid.

It's a bug in webhint, AFAICT. It thinks stale-while-revalidate should not hold a value, but that is wrong. This is the problematic code:

Aug 10 2023, 9:14 PM · MediaWiki-Platform-Team, Performance-Team, MediaWiki-ResourceLoader
ori closed T244711: wmerrors needs tests as Resolved.
Aug 10 2023, 8:35 PM · Performance-Team, Test-Coverage, php-wmerrors

Aug 6 2023

Smjalageri awarded Image Macro "shits-on-fire" a Like token.
Aug 6 2023, 7:36 AM

Jul 31 2023

ori added a comment to T211661: Automatically clean up unused thumbnails in Swift.

The other thing I can't quite leave alone is - why are we being asked for some thumbnails so often? Shouldn't the CDN be caching thumbs? If we served each thumb only once in that 24 hour period, that would have saved about 54 million requests to swift (which is 29% of the requests swift served), which is non-trivial...

Commonest-served thumbs on that day (with request counts):

8924 wikipedia-commons-local-thumb.8e/8/8e/Edit_remove.svg/15px-Edit_remove.svg.png
8053 wikipedia-commons-local-thumb.2c/2/2c/Broom_icon.svg/22px-Broom_icon.svg.png
6268 wikipedia-commons-local-thumb.de/d/de/Wynn.svg/25px-Wynn.svg.png
6264 wikipedia-commons-local-thumb.33/3/33/Crystal_Clear_action_viewmag.png/22px-Crystal_Clear_action_viewmag.png
6258 wikipedia-commons-local-thumb.1e/1/1e/Font_Awesome_5_solid_arrow-down.svg/19px-Font_Awesome_5_solid_arrow-down.svg.png
6256 wikipedia-commons-local-thumb.b2/b/b2/Font_Awesome_5_solid_arrow-up.svg/19px-Font_Awesome_5_solid_arrow-up.svg.png
5706 wikipedia-commons-local-thumb.b3/b/b3/Broom_icon_ref.svg/22px-Broom_icon_ref.svg.png
4990 wikipedia-commons-local-thumb.33/3/33/Crystal_Clear_action_viewmag.png/21px-Crystal_Clear_action_viewmag.png
Jul 31 2023, 12:46 PM · MediaWiki-Platform-Team (Radar), Performance Issue, Traffic, SRE-swift-storage, SRE

Jul 24 2023

ori added a comment to T211661: Automatically clean up unused thumbnails in Swift.

I also don't know how well Swift would handle 15k QPS of object metadata updates (cf T211661#8377883)

Jul 24 2023, 3:53 PM · MediaWiki-Platform-Team (Radar), Performance Issue, Traffic, SRE-swift-storage, SRE
ori added a comment to T211661: Automatically clean up unused thumbnails in Swift.

Right. Now I remember. The initial expiration is indeed supposed to be set by Thumbor. The necessary functionality had some trouble landing in the Wikimedia Thumbor plugin repo, but it has since landed.

Jul 24 2023, 3:19 PM · MediaWiki-Platform-Team (Radar), Performance Issue, Traffic, SRE-swift-storage, SRE
ori added a comment to T211661: Automatically clean up unused thumbnails in Swift.

@MatthewVernon: my understanding is that rewrite.py is currently setting expiry headers for thumbnails on retrieval from Swift -- is that correct, and does that mean some thumbnails are already getting expired?

Jul 24 2023, 1:15 PM · MediaWiki-Platform-Team (Radar), Performance Issue, Traffic, SRE-swift-storage, SRE

May 8 2023

ori reopened T328842: Restructure paws away from special networking, a subtask of T328968: Revert changes in T328967, as Open.
May 8 2023, 5:29 AM · PAWS
ori reopened T328842: Restructure paws away from special networking, a subtask of T328971: Remove old ingress attach public IP to VM, as Open.
May 8 2023, 5:29 AM · PAWS
ori reopened T328842: Restructure paws away from special networking as "Open".
May 8 2023, 5:29 AM · PAWS
ori added a comment to T328842: Restructure paws away from special networking.

This is really confusing.

May 8 2023, 5:27 AM · PAWS

Apr 18 2023

ori added a comment to T334895: XSS via Graph extension.

Vega ships an optional interpreter that can evaluate graph expressions by traversing an AST and performing each operation, rather than relying on runtime code generation. Per https://github.com/vega/vega/pull/3019#issuecomment-749107902, the interpreter mode is not the default because it is 10% slower. Seems like a negligible price to me. This seems like the only sensible option for keeping support for graph expressions but rooting out XSS vectors systematically.

Apr 18 2023, 3:51 PM · SecTeam-wikimedia-project-event, SecTeam-Processed, WMDE-TechWish-Sprint-2023-04-05, Editing-team, Vuln-XSS, MediaWiki-extensions-Graph, Security, Security-Team

Mar 5 2023

ori added a comment to T330766: Decommission the EditorActivation instrument.

@phuedx I don't know, sorry.

Mar 5 2023, 6:56 AM · Data Engineering and Event Platform Team, MW-1.41-notes (1.41.0-wmf.2; 2023-03-27), Data-Engineering, Technical-Debt, MediaWiki-extensions-WikimediaEvents, Product-Analytics, Event-Platform

Feb 14 2023

ori added a comment to T327440: Post-deployment Vector 2022 metrics analysis on English Wikipedia.

Does the edits graph in T327440#8542723 include bots? Bots may not be a large proportion of users but they do contribute a large proportion of edits.

Feb 14 2023, 3:48 AM · Product-Analytics (Kanban), Web-Team-Backlog

Jan 13 2023

ori added a comment to T326607: Future of liuggio/statsd-php-client?.

+1 to @Tgr's proposal

Jan 13 2023, 1:49 AM · MediaWiki-libs-Stats, SRE Observability, observability, serviceops-radar, Grafana, Technical-Debt

Jan 10 2023

ori added a comment to T326607: Future of liuggio/statsd-php-client?.

It might be worth it to try and contact the library's co-maintainer. His contact info is at https://eatingco.de/about/.

Jan 10 2023, 3:07 AM · MediaWiki-libs-Stats, SRE Observability, observability, serviceops-radar, Grafana, Technical-Debt

Jan 9 2023

Tgr awarded T99268: RfC: Create a proper command-line runner for MediaWiki maintenance tasks a Love token.
Jan 9 2023, 7:05 AM · MW-1.40-notes (1.40.0-wmf.18; 2023-01-09), Wikimedia-Hackathon-2021, Platform Engineering Roadmap Decision Making, TechCom-RFC (TechCom-RFC-Closed), MediaWiki-Maintenance-system

Dec 23 2022

Volker_E awarded T99268: RfC: Create a proper command-line runner for MediaWiki maintenance tasks a Like token.
Dec 23 2022, 5:12 PM · MW-1.40-notes (1.40.0-wmf.18; 2023-01-09), Wikimedia-Hackathon-2021, Platform Engineering Roadmap Decision Making, TechCom-RFC (TechCom-RFC-Closed), MediaWiki-Maintenance-system

Nov 14 2022

ori added a comment to T322964: reviewer comments missing on a specific change.

Nov 14 2022, 12:02 AM · Gerrit (Gerrit 3.5)

Oct 18 2022

ori updated subscribers of T316706: Run user-submitted code under gVisor.

@Jdforrester-WMF : the Beta Cluster instance of the function-evaluator now runs under GVisor. Some additional work will be required to make the production instance of the function-evaluator run under GVisor. There is documentation here: https://gvisor.dev/docs/user_guide/quick_start/kubernetes/.

Oct 18 2022, 4:05 PM · Abstract Wikipedia team, function-evaluator
ori updated the task description for T316706: Run user-submitted code under gVisor.
Oct 18 2022, 3:58 PM · Abstract Wikipedia team, function-evaluator
ori added a comment to T275945: Create Wikifunctions.org.

I created a new task for the alerts, T321099. Let's continue there.

Oct 18 2022, 3:56 PM · Patch-For-Review, MW-1.41-notes (1.41.0-wmf.19; 2023-07-25), User-Urbanecm, Wiki-Setup (Create), Epic, Abstract Wikipedia team (Phase λ – Launch)
ori updated subscribers of T321099: ProbeSlow alerts for Wikifunctions on Beta Cluster.

Wikifunctions on the Beta Cluster uses the *.wikimedia.beta.wmflabs.org wildcard cert, and the CertAlmostExpired alert was caused by automatic certificate renewal being broken on the Beta Cluster in general. T293585 is the issue; it looks like Valentin and Giuseppe fixed it.

Oct 18 2022, 3:54 PM · Abstract Wikipedia team
ori created T321099: ProbeSlow alerts for Wikifunctions on Beta Cluster.
Oct 18 2022, 3:47 PM · Abstract Wikipedia team

Oct 14 2022

ori added a comment to T318258: Decommission the EditConflict instrument.

@phuedx I'm not aware of anything actively using it, no, but I'm also out of the loop -- can you ask someone on the performance team to confirm?

Oct 14 2022, 2:13 PM · MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Performance-Team (Radar), MediaWiki-extensions-WikimediaEvents

Oct 12 2022

ori placed T307742: Memoize Wikifunction functions calls in memcached up for grabs.
Oct 12 2022, 2:36 PM · MW-1.41-notes (1.41.0-wmf.20; 2023-08-01), Abstract Wikipedia team, MW-1.40-notes (1.40.0-wmf.13; 2022-12-05)
ori closed T307699: Formalize the semantics of the function model, a subtask of T296326: Discuss How to Implement Unions, as Resolved.
Oct 12 2022, 2:35 PM · Abstract Wikipedia team
ori closed T307699: Formalize the semantics of the function model as Resolved.
Oct 12 2022, 2:35 PM · 2022 Wikimedia Google.org Fellowship, Abstract Wikipedia team (Phase θ – Throttling)
ori added a comment to T307699: Formalize the semantics of the function model.

Done by Ali: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Semantics_of_Wikifunctions

Oct 12 2022, 2:35 PM · 2022 Wikimedia Google.org Fellowship, Abstract Wikipedia team (Phase θ – Throttling)
ori closed T307700: Observability for function-* services as Resolved.
Oct 12 2022, 2:33 PM · Abstract Wikipedia team (Phase θ – Throttling), 2022 Wikimedia Google.org Fellowship, function-evaluator, function-orchestrator
ori closed T307820: Prototype Abstract Wikipedia in Scribunto as Resolved.
Oct 12 2022, 2:33 PM · Abstract Wikipedia team (Phase θ – Throttling), 2022 Wikimedia Google.org Fellowship
ori added a comment to T308250: Should Wikifunctions use a WebAssembly runtime?.

Relevant: Provably-Safe Multilingual Software Sandboxing using WebAssembly

Oct 12 2022, 2:32 PM · Abstract Wikipedia team, 2022 Wikimedia Google.org Fellowship
ori closed T310199: Select fastest correct implementation as Declined.
Oct 12 2022, 2:31 PM · Abstract Wikipedia team (Phase θ – Throttling)
ori closed T310093: Investigate why function evaluation is slow as Resolved.
Oct 12 2022, 2:31 PM · Abstract Wikipedia team (Phase θ – Throttling), 2022 Wikimedia Google.org Fellowship
ori closed T314788: Performance analysis documentation for Wikifunctions as Declined.
Oct 12 2022, 2:30 PM · Abstract Wikipedia team (Phase θ – Throttling), 2022 Wikimedia Google.org Fellowship
ori added a comment to T316706: Run user-submitted code under gVisor.

I've cherry-picked the two Puppet patches on the beta cluster. The mediawiki-function-evaluator service is now running under gVisor.

Oct 12 2022, 2:24 PM · Abstract Wikipedia team, function-evaluator

Oct 11 2022

ori added a comment to T316879: Make gVisor packages available via apt.wikimedia.org.

Never mind, I see that it is available for Bullseye -- sorry.

Oct 11 2022, 3:12 PM · Patch-For-Review, Infrastructure-Foundations, serviceops
ori added a comment to T316879: Make gVisor packages available via apt.wikimedia.org.

@Joe the Wikifunctions Beta Cluster instance is running Bullseye -- could you also pull it in there?

Oct 11 2022, 3:04 PM · Patch-For-Review, Infrastructure-Foundations, serviceops

Sep 28 2022

ori committed rMSFS952d8c6ccb81: Update test coverage settings to define test files vs non-test files (authored by maryyang).
Update test coverage settings to define test files vs non-test files
Sep 28 2022, 8:11 PM

Sep 8 2022

ori closed T315019: HTTP 500 errors from Beta Cluster Wikifunctions health-check API endpoint as Resolved.

There are no outstanding issues that are specific to the Beta Cluster environment, AFAIK.

Sep 8 2022, 5:28 PM · MW-1.39-notes (1.39.0-wmf.27; 2022-08-29), Abstract Wikipedia team (Phase θ – Throttling)
ori closed T316886: Internal server error when calling function on NLG types as Resolved.
Sep 8 2022, 5:27 PM · Abstract Wikipedia team (Phase θ – Throttling)
ori triaged T315403: Framework for running experiments on a subset of the app server fleet as Low priority.
Sep 8 2022, 2:31 PM · serviceops, SRE, Observability-Logging, Observability-Metrics

Sep 7 2022

ori committed rOSPU7e7f1fd1bf1b: Sort query parameters in URLs (authored by Joe).
Sort query parameters in URLs
Sep 7 2022, 2:58 PM

Sep 6 2022

ori closed T285312: Enable Logging in Backend Services, a subtask of T299598: Add security limits to the Wikifunctions system to maintain stability and integrity of the content, as Resolved.
Sep 6 2022, 4:02 PM · Abstract Wikipedia team (Phase λ – Launch), Epic
ori closed T285312: Enable Logging in Backend Services as Resolved.

@cmassaro We have some logging now, and instructions on Wikitech on how to access the logs. I think there are more places where we can add additional logging to make debugging easier, but that is better dealt with on an ongoing basis than a dedicated task.

Sep 6 2022, 4:02 PM · Abstract Wikipedia team (Phase θ – Throttling), function-evaluator, function-orchestrator
ori closed T290700: Use a Proper Logging Module in Orchestrator as Resolved.
Sep 6 2022, 3:59 PM · Patch-For-Review, Abstract Wikipedia Fix-It tasks, Abstract Wikipedia team (Phase θ – Throttling), function-orchestrator
ori closed T290700: Use a Proper Logging Module in Orchestrator, a subtask of T285312: Enable Logging in Backend Services, as Resolved.
Sep 6 2022, 3:58 PM · Abstract Wikipedia team (Phase θ – Throttling), function-evaluator, function-orchestrator
ori updated subscribers of T317064: History pages' caches not being invalidated after edits.

I suspect this is fallout from the URL query sorting change (cc @ori) not invalidating the cache of history pages properly.

Sep 6 2022, 3:44 AM · Patch-For-Review, Performance-Team (Radar), MediaWiki-Core-HTTP-Cache, SRE, Regression, Traffic, MediaWiki-Page-history

Sep 2 2022

ori added a comment to T132418: Evaluate using 'stale-while-revalidate' HTTP cache control.

Chrome is shipping this as of Chrome 75. Time to reconsider!

Sep 2 2022, 7:02 PM · MW-1.40-notes (1.40.0-wmf.14; 2022-12-12), Patch-For-Review, MediaWiki-ResourceLoader, Performance-Team
ori added a comment to T316886: Internal server error when calling function on NLG types.

Can we see the function call that was sent? Even just copying the ZObject from expert mode in the UI will help.

Sep 2 2022, 5:25 PM · Abstract Wikipedia team (Phase θ – Throttling)
ori added a comment to T316886: Internal server error when calling function on NLG types.

That no longer looks like an error that would be specific to the Beta cluster environment. @AAssaf-WMF , can you see if you get the same error locally?

Sep 2 2022, 3:19 PM · Abstract Wikipedia team (Phase θ – Throttling)
ori added a comment to T316886: Internal server error when calling function on NLG types.

The API Sandbox request in the task description is still failing, but the underlying error is now different:

Sep 2 2022, 3:09 PM · Abstract Wikipedia team (Phase θ – Throttling)
ori added a comment to T316859: Usage of babel?.

I think you can create a patch to remove it from package.json, and we'll see if all the integration tests pass. If anything breaks after merge we can always revert easily.

Sep 2 2022, 2:26 PM · Abstract Wikipedia team (Phase κ – Clean-up), WikiLambda
ori committed rMSFO51c33058d39a: Set custom User-Agent header on requests (authored by ori).
Set custom User-Agent header on requests
Sep 2 2022, 2:20 PM
ori updated the task description for T310880: Post-creation work for pcmwiki.
Sep 2 2022, 2:10 PM · Wiki-Setup
ori committed rGRBD6d707c562a45: Add pcmwiki to RESTBase (authored by ori).
Add pcmwiki to RESTBase
Sep 2 2022, 1:29 PM
Joe awarded T316706: Run user-submitted code under gVisor a Love token.
Sep 2 2022, 5:15 AM · Abstract Wikipedia team, function-evaluator

Sep 1 2022

ori renamed T239609: The N'Ko language cannot be looked up by its English name in the languages search box on Mobile web from The N'Ko language cannot be looked up by it's English name in the languages search box on Mobile web to The N'Ko language cannot be looked up by its English name in the languages search box on Mobile web.
Sep 1 2022, 8:39 PM · Web-Team-Backlog, MobileFrontend
ori added a comment to T316886: Internal server error when calling function on NLG types.

OK, it looks like the default User-Agent string sent by node-fetch is blocked by Varnish:
https://github.com/wikimedia/puppet/blob/9843300dba/modules/varnish/templates/wikimedia-frontend.vcl.erb#L716-L718
We need to set a custom user-agent string for the orchestrator.

Sep 1 2022, 7:26 PM · Abstract Wikipedia team (Phase θ – Throttling)
ori added a comment to T316886: Internal server error when calling function on NLG types.

Ok, I hacked in some debugging code to include the HTML body in the response, and it looks like the orchestrator is getting an error page with the message:

Sep 1 2022, 6:05 PM · Abstract Wikipedia team (Phase θ – Throttling)
ori added a comment to T316886: Internal server error when calling function on NLG types.

It seems that the orchestrator is getting an invalid response from the MediaWiki API:

Sep 1 2022, 5:36 PM · Abstract Wikipedia team (Phase θ – Throttling)
ori added a comment to T268678: Make Wikifunctions a true multi-lingual wiki, exposing content in each language to readers and search engines with parity.

When a page on a wiki is updated, MediaWiki sends purge requests to the CDN layer to invalidate objects in the cache. Currently, this is URL-based. So, for example, if I got edit the article on 'Science' on enwiki, MediaWiki will send purge requests to Varnish for the following URLs:

Sep 1 2022, 4:48 PM · Abstract Wikipedia team, MW-1.41-notes (1.41.0-wmf.19; 2023-07-25)
ori reopened T315019: HTTP 500 errors from Beta Cluster Wikifunctions health-check API endpoint as "Open".

We're seeing errors again.

Sep 1 2022, 4:15 PM · MW-1.39-notes (1.39.0-wmf.27; 2022-08-29), Abstract Wikipedia team (Phase θ – Throttling)
ori updated the task description for T316879: Make gVisor packages available via apt.wikimedia.org.
Sep 1 2022, 3:50 PM · Patch-For-Review, Infrastructure-Foundations, serviceops
ori created T316879: Make gVisor packages available via apt.wikimedia.org.
Sep 1 2022, 3:50 PM · Patch-For-Review, Infrastructure-Foundations, serviceops

Aug 30 2022

ori added a project to T316706: Run user-submitted code under gVisor: Abstract Wikipedia team.
Aug 30 2022, 7:35 PM · Abstract Wikipedia team, function-evaluator
ori created T316706: Run user-submitted code under gVisor.
Aug 30 2022, 7:34 PM · Abstract Wikipedia team, function-evaluator
ori committed rMSFO183e58d8484d: Centralize access to a shared logger object (authored by ori).
Centralize access to a shared logger object
Aug 30 2022, 5:23 PM
ori closed T138093: Investigate query parameter normalization for MW/services as Resolved.

This is now rolled for text frontends.

Aug 30 2022, 4:07 PM · MW-1.39-notes (1.39.0-wmf.25; 2022-08-15), Patch-For-Review, Traffic-Icebox, Platform Team Legacy (Watching / External), Services (watching), SRE, MediaWiki-General
ori updated the task description for T314868: Roll out query parameter normalization.
Aug 30 2022, 3:52 PM · MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), Patch-For-Review, Traffic, SRE, MediaWiki-General
ori added a comment to T314868: Roll out query parameter normalization.

This is now complete. Many thanks to @Vgutierrez for partnering with me to get this rolled out.

Aug 30 2022, 2:51 PM · MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), Patch-For-Review, Traffic, SRE, MediaWiki-General
ori closed T314868: Roll out query parameter normalization, a subtask of T138093: Investigate query parameter normalization for MW/services, as Resolved.
Aug 30 2022, 2:35 PM · MW-1.39-notes (1.39.0-wmf.25; 2022-08-15), Patch-For-Review, Traffic-Icebox, Platform Team Legacy (Watching / External), Services (watching), SRE, MediaWiki-General
ori closed T314868: Roll out query parameter normalization as Resolved.
Aug 30 2022, 2:34 PM · MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), Patch-For-Review, Traffic, SRE, MediaWiki-General

Aug 28 2022

ori added a comment to T315398: Set MW appserver scaling_governor to performance.

I tried setting EPP to 0 using x86_energy_perf_policy, thinking that bypassing the sysfs interface and writing directly to the MSR would make the setting sticky. Unfortunately this does not seem to be the case -- the EPP is gradually reset to 128, same as when you tried changing it via sysfs. At this point I also don't see value in further experimentation with the EPP knob and agree that performance is the way to go.

Aug 28 2022, 9:14 PM · Performance-Team (Radar), SRE

Aug 26 2022

ori added a comment to T315398: Set MW appserver scaling_governor to performance.

Actually, let me not step on your toes. But if you can tolerate a short extension of this task, I would very much like to see this setting tested. I think there is a good chance it will give the same or very similar performance increase with less waste of power. Just to be fully explicit, the setting is:

Aug 26 2022, 11:12 AM · Performance-Team (Radar), SRE
ori added a comment to T315398: Set MW appserver scaling_governor to performance.

So 'powersave' with EPP=0 gives a broader range of operating frequencies than 'performance'. We should see if in this mode the frequency scaling is still responsive enough for the workload.

Aug 26 2022, 7:33 AM · Performance-Team (Radar), SRE

Aug 22 2022

ori updated the task description for T314868: Roll out query parameter normalization.
Aug 22 2022, 9:55 AM · MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), Patch-For-Review, Traffic, SRE, MediaWiki-General

Aug 20 2022

ori added a comment to T290700: Use a Proper Logging Module in Orchestrator.

Unfortunately continuation-local-storage and its more modern counterpart, AsyncLocalStorage come with a substantial performance cost, particularly for workloads with a lot of async/await calls. I don't think we can afford the performance penalty.

Aug 20 2022, 1:58 AM · Patch-For-Review, Abstract Wikipedia Fix-It tasks, Abstract Wikipedia team (Phase θ – Throttling), function-orchestrator

Aug 19 2022

ori added a comment to T252719: Upgrade thumbor to Thumbor 7 and python3.

@Vlad.shapik thank you, but what about the other points I raised?

Aug 19 2022, 3:23 PM · Patch-For-Review, Thumbor Migration, Python3-Porting
ori added a parent task for T240685: MediaWiki Prometheus support: T315403: Framework for running experiments on a subset of the app server fleet.
Aug 19 2022, 2:25 PM · SRE Observability (FY2023/2024-Q3), MW-1.41-notes (1.41.0-wmf.28; 2023-09-26), MW-1.40-notes (1.40.0-wmf.27; 2023-03-13), MW-1.38-notes (1.38.0-wmf.19; 2022-01-24), MediaWiki-libs-Stats, Platform Team Workboards (External Code Reviews), serviceops, SRE, MediaWiki-General, observability
ori added a subtask for T315403: Framework for running experiments on a subset of the app server fleet: T240685: MediaWiki Prometheus support.
Aug 19 2022, 2:25 PM · serviceops, SRE, Observability-Logging, Observability-Metrics
ori awarded T314453: Switch ChronologyProtector from redis to memcached a Doubloon token.
Aug 19 2022, 2:20 PM · serviceops, Performance-Team

Aug 18 2022

ori added a comment to T315398: Set MW appserver scaling_governor to performance.

Congratulations, this is a huge win! I think we should dig deeper to see if we can get the same or similar performance benefit, but waste less power.

Aug 18 2022, 8:21 PM · Performance-Team (Radar), SRE
ori added a comment to T252719: Upgrade thumbor to Thumbor 7 and python3.

Thanks @roman-stolar. I think it's a mistake to combine (a) changes from (multiple?) upstream(s), (b) unmerged changes from Gerrit, and (c) your own work into a single commit, as in Icabc39dab. This destroys some useful history (for example, the authorship, review comments and discussion on Id6ec6d62c), and it makes future reconciliation with upstream code harder. It's also error-prone.

Aug 18 2022, 3:40 PM · Patch-For-Review, Thumbor Migration, Python3-Porting
ori added a comment to T252719: Upgrade thumbor to Thumbor 7 and python3.

@Vlad.shapik, @roman-stolar, ping on the above :)
I'd also like to understand the deployment plan for this. Are you working with anyone on Wikimedia SRE to get this deployed? I strongly recommend deploying small, incremental updates rather than accumulating a lot of changes.

Aug 18 2022, 2:42 PM · Patch-For-Review, Thumbor Migration, Python3-Porting
lmata awarded T315403: Framework for running experiments on a subset of the app server fleet a Like token.
Aug 18 2022, 12:40 PM · serviceops, SRE, Observability-Logging, Observability-Metrics

Aug 17 2022

ori updated the task description for T314868: Roll out query parameter normalization.
Aug 17 2022, 4:22 PM · MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), Patch-For-Review, Traffic, SRE, MediaWiki-General
ori added a comment to T315398: Set MW appserver scaling_governor to performance.

Based on https://www.kernel.org/doc/html/v5.6/admin-guide/pm/intel_pstate.html#operation-modes the scaling behavior will be different for systems depending on whether or not hardware-managed P-states (HWP) support is available and enabled. It looks like it is not available on 56 out of 265 app servers: P32411.

Aug 17 2022, 5:53 AM · Performance-Team (Radar), SRE
ori created P32411 (An Untitled Masterwork).
Aug 17 2022, 5:51 AM
ori updated subscribers of T315403: Framework for running experiments on a subset of the app server fleet.
Aug 17 2022, 5:04 AM · serviceops, SRE, Observability-Logging, Observability-Metrics
ori updated the task description for T315403: Framework for running experiments on a subset of the app server fleet.
Aug 17 2022, 5:02 AM · serviceops, SRE, Observability-Logging, Observability-Metrics
ori updated the task description for T315403: Framework for running experiments on a subset of the app server fleet.
Aug 17 2022, 5:01 AM · serviceops, SRE, Observability-Logging, Observability-Metrics
ori created T315403: Framework for running experiments on a subset of the app server fleet.
Aug 17 2022, 5:01 AM · serviceops, SRE, Observability-Logging, Observability-Metrics
ori added a comment to T315398: Set MW appserver scaling_governor to performance.

I propose making this change on all eqiad appservers in soft state, with cumin. Our latency metrics are noisy so changing it everywhere at once will give us the best chance of measuring a benefit.

Aug 17 2022, 2:29 AM · Performance-Team (Radar), SRE
ori added a comment to T315350: Beta cluster Error: 502, Next Hop Connection Failed.

Follow-up items to get the Puppet repo on deployment-puppetmaster04 in good shape:

Aug 17 2022, 12:06 AM · User-Ryasmeen, Beta-Cluster-Infrastructure, Release-Engineering-Team, Wikimedia-Incident, SRE-OnFire