Page MenuHomePhabricator

ori (Ori Livneh)
Senior Grepper

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 4:18 AM (442 w, 4 d)
Availability
Available
IRC Nick
ori
LDAP User
Ori
MediaWiki User
ATDT [ Global Accounts ]

Recent Activity

Sun, Mar 5

ori added a comment to T330766: Decommission the EditorActivation instrument.

@phuedx I don't know, sorry.

Sun, Mar 5, 6:56 AM · MW-1.41-notes (1.41.0-wmf.2; 2023-03-27), Patch-For-Review, Data-Engineering, Technical-Debt, MediaWiki-extensions-WikimediaEvents, Product-Analytics, Event-Platform Value Stream

Feb 14 2023

ori added a comment to T327440: Post-deployment Vector 2022 metrics analysis on English Wikipedia.

Does the edits graph in T327440#8542723 include bots? Bots may not be a large proportion of users but they do contribute a large proportion of edits.

Feb 14 2023, 3:48 AM · Product-Analytics (Kanban), Readers-Web-Backlog

Jan 13 2023

ori added a comment to T326607: Future of liuggio/statsd-php-client?.

+1 to @Tgr's proposal

Jan 13 2023, 1:49 AM · observability, serviceops-radar, Graphite, Technical-Debt

Jan 10 2023

ori added a comment to T326607: Future of liuggio/statsd-php-client?.

It might be worth it to try and contact the library's co-maintainer. His contact info is at https://eatingco.de/about/.

Jan 10 2023, 3:07 AM · observability, serviceops-radar, Graphite, Technical-Debt

Jan 9 2023

Tgr awarded T99268: RfC: Create a proper command-line runner for MediaWiki maintenance tasks a Love token.
Jan 9 2023, 7:05 AM · MW-1.40-notes (1.40.0-wmf.18; 2023-01-09), Wikimedia-Hackathon-2021, Patch-For-Review, Platform Engineering Roadmap Decision Making, TechCom-RFC (TechCom-RFC-Closed), MediaWiki-Maintenance-system

Dec 23 2022

Volker_E awarded T99268: RfC: Create a proper command-line runner for MediaWiki maintenance tasks a Like token.
Dec 23 2022, 5:12 PM · MW-1.40-notes (1.40.0-wmf.18; 2023-01-09), Wikimedia-Hackathon-2021, Patch-For-Review, Platform Engineering Roadmap Decision Making, TechCom-RFC (TechCom-RFC-Closed), MediaWiki-Maintenance-system

Nov 14 2022

ori added a comment to T322964: reviewer comments missing on a specific change.

Nov 14 2022, 12:02 AM · Gerrit (Gerrit 3.5)

Oct 18 2022

ori updated subscribers of T316706: Run user-submitted code under gVisor.

@Jdforrester-WMF : the Beta Cluster instance of the function-evaluator now runs under GVisor. Some additional work will be required to make the production instance of the function-evaluator run under GVisor. There is documentation here: https://gvisor.dev/docs/user_guide/quick_start/kubernetes/.

Oct 18 2022, 4:05 PM · function-evaluator, Abstract Wikipedia team (Phase θ – Throttling)
ori updated the task description for T316706: Run user-submitted code under gVisor.
Oct 18 2022, 3:58 PM · function-evaluator, Abstract Wikipedia team (Phase θ – Throttling)
ori added a comment to T275945: Launch Wikifunctions.

I created a new task for the alerts, T321099. Let's continue there.

Oct 18 2022, 3:56 PM · Patch-For-Review, User-Urbanecm, Wiki-Setup (Create), Epic, Abstract Wikipedia team (Phase λ – Launch)
ori updated subscribers of T321099: ProbeSlow alerts for Wikifunctions on Beta Cluster.

Wikifunctions on the Beta Cluster uses the *.wikimedia.beta.wmflabs.org wildcard cert, and the CertAlmostExpired alert was caused by automatic certificate renewal being broken on the Beta Cluster in general. T293585 is the issue; it looks like Valentin and Giuseppe fixed it.

Oct 18 2022, 3:54 PM · Abstract Wikipedia team (Phase λ – Launch)
ori created T321099: ProbeSlow alerts for Wikifunctions on Beta Cluster.
Oct 18 2022, 3:47 PM · Abstract Wikipedia team (Phase λ – Launch)

Oct 14 2022

ori added a comment to T318258: Decommission the EditConflict instrument.

@phuedx I'm not aware of anything actively using it, no, but I'm also out of the loop -- can you ask someone on the performance team to confirm?

Oct 14 2022, 2:13 PM · MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Performance-Team (Radar), MediaWiki-extensions-WikimediaEvents

Oct 12 2022

ori placed T307742: Memoize Wikifunction functions calls in memcached up for grabs.
Oct 12 2022, 2:36 PM · Patch-For-Review, MW-1.40-notes (1.40.0-wmf.13; 2022-12-05), Abstract Wikipedia team (Phase θ – Throttling)
ori closed T307699: Formalize the semantics of the function model, a subtask of T296326: Discuss How to Implement Unions, as Resolved.
Oct 12 2022, 2:35 PM · Abstract Wikipedia team
ori closed T307699: Formalize the semantics of the function model as Resolved.
Oct 12 2022, 2:35 PM · 2022 Wikimedia Google.org Fellowship, Abstract Wikipedia team (Phase θ – Throttling)
ori added a comment to T307699: Formalize the semantics of the function model.

Done by Ali: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Semantics_of_Wikifunctions

Oct 12 2022, 2:35 PM · 2022 Wikimedia Google.org Fellowship, Abstract Wikipedia team (Phase θ – Throttling)
ori closed T307700: Observability for function-* services as Resolved.
Oct 12 2022, 2:33 PM · Abstract Wikipedia team (Phase θ – Throttling), 2022 Wikimedia Google.org Fellowship, function-evaluator, function-orchestrator
ori closed T307820: Prototype Abstract Wikipedia in Scribunto as Resolved.
Oct 12 2022, 2:33 PM · Abstract Wikipedia team (Phase θ – Throttling), 2022 Wikimedia Google.org Fellowship
ori added a comment to T308250: Should Wikifunctions use a WebAssembly runtime?.

Relevant: Provably-Safe Multilingual Software Sandboxing using WebAssembly

Oct 12 2022, 2:32 PM · Abstract Wikipedia team, 2022 Wikimedia Google.org Fellowship
ori closed T310199: Select fastest correct implementation as Declined.
Oct 12 2022, 2:31 PM · Abstract Wikipedia team (Phase θ – Throttling)
ori closed T310093: Investigate why function evaluation is slow as Resolved.
Oct 12 2022, 2:31 PM · Abstract Wikipedia team (Phase θ – Throttling), 2022 Wikimedia Google.org Fellowship
ori closed T314788: Performance analysis documentation for Wikifunctions as Declined.
Oct 12 2022, 2:30 PM · Abstract Wikipedia team (Phase θ – Throttling), 2022 Wikimedia Google.org Fellowship
ori added a comment to T316706: Run user-submitted code under gVisor.

I've cherry-picked the two Puppet patches on the beta cluster. The mediawiki-function-evaluator service is now running under gVisor.

Oct 12 2022, 2:24 PM · function-evaluator, Abstract Wikipedia team (Phase θ – Throttling)

Oct 11 2022

ori added a comment to T316879: Make gVisor packages available via apt.wikimedia.org.

Never mind, I see that it is available for Bullseye -- sorry.

Oct 11 2022, 3:12 PM · Patch-For-Review, Infrastructure-Foundations, serviceops
ori added a comment to T316879: Make gVisor packages available via apt.wikimedia.org.

@Joe the Wikifunctions Beta Cluster instance is running Bullseye -- could you also pull it in there?

Oct 11 2022, 3:04 PM · Patch-For-Review, Infrastructure-Foundations, serviceops

Sep 28 2022

ori committed rMSFS952d8c6ccb81: Update test coverage settings to define test files vs non-test files (authored by maryyang).
Update test coverage settings to define test files vs non-test files
Sep 28 2022, 8:11 PM

Sep 8 2022

ori closed T315019: HTTP 500 errors from Beta Cluster Wikifunctions health-check API endpoint as Resolved.

There are no outstanding issues that are specific to the Beta Cluster environment, AFAIK.

Sep 8 2022, 5:28 PM · MW-1.39-notes (1.39.0-wmf.27; 2022-08-29), Abstract Wikipedia team (Phase θ – Throttling)
ori closed T316886: Internal server error when calling function on NLG types as Resolved.
Sep 8 2022, 5:27 PM · Abstract Wikipedia team (Phase θ – Throttling)
ori triaged T315403: Framework for running experiments on a subset of the app server fleet as Low priority.
Sep 8 2022, 2:31 PM · serviceops, Performance-Team (Radar), SRE, Observability-Logging, Observability-Metrics

Sep 7 2022

ori committed rOSPU7e7f1fd1bf1b: Sort query parameters in URLs (authored by Joe).
Sort query parameters in URLs
Sep 7 2022, 2:58 PM

Sep 6 2022

ori closed T285312: Enable Logging in Backend Services, a subtask of T299598: Add security limits to the Wikifunctions system to maintain stability and integrity of the content, as Resolved.
Sep 6 2022, 4:02 PM · Epic, Abstract Wikipedia team (Phase θ – Throttling)
ori closed T285312: Enable Logging in Backend Services as Resolved.

@cmassaro We have some logging now, and instructions on Wikitech on how to access the logs. I think there are more places where we can add additional logging to make debugging easier, but that is better dealt with on an ongoing basis than a dedicated task.

Sep 6 2022, 4:02 PM · Abstract Wikipedia team (Phase θ – Throttling), function-evaluator, function-orchestrator
ori closed T290700: Use a Proper Logging Module in Orchestrator as Resolved.
Sep 6 2022, 3:59 PM · Patch-For-Review, Abstract Wikipedia Fix-It tasks, Abstract Wikipedia team (Phase θ – Throttling), function-orchestrator
ori closed T290700: Use a Proper Logging Module in Orchestrator, a subtask of T285312: Enable Logging in Backend Services, as Resolved.
Sep 6 2022, 3:58 PM · Abstract Wikipedia team (Phase θ – Throttling), function-evaluator, function-orchestrator
ori updated subscribers of T317064: History pages' caches not being invalidated after edits.

I suspect this is fallout from the URL query sorting change (cc @ori) not invalidating the cache of history pages properly.

Sep 6 2022, 3:44 AM · Patch-For-Review, Performance-Team (Radar), MediaWiki-Core-HTTP-Cache, SRE, Regression, Traffic, MediaWiki-Page-history

Sep 2 2022

ori added a comment to T132418: Evaluate using 'stale-while-revalidate' HTTP cache control.

Chrome is shipping this as of Chrome 75. Time to reconsider!

Sep 2 2022, 7:02 PM · MW-1.40-notes (1.40.0-wmf.14; 2022-12-12), Patch-For-Review, MediaWiki-ResourceLoader, Performance-Team
ori added a comment to T316886: Internal server error when calling function on NLG types.

Can we see the function call that was sent? Even just copying the ZObject from expert mode in the UI will help.

Sep 2 2022, 5:25 PM · Abstract Wikipedia team (Phase θ – Throttling)
ori added a comment to T316886: Internal server error when calling function on NLG types.

That no longer looks like an error that would be specific to the Beta cluster environment. @AAssaf-WMF , can you see if you get the same error locally?

Sep 2 2022, 3:19 PM · Abstract Wikipedia team (Phase θ – Throttling)
ori added a comment to T316886: Internal server error when calling function on NLG types.

The API Sandbox request in the task description is still failing, but the underlying error is now different:

Sep 2 2022, 3:09 PM · Abstract Wikipedia team (Phase θ – Throttling)
ori added a comment to T316859: Usage of babel?.

I think you can create a patch to remove it from package.json, and we'll see if all the integration tests pass. If anything breaks after merge we can always revert easily.

Sep 2 2022, 2:26 PM · Abstract Wikipedia team (Phase κ – Clean-up), WikiLambda
ori committed rMSFO51c33058d39a: Set custom User-Agent header on requests (authored by ori).
Set custom User-Agent header on requests
Sep 2 2022, 2:20 PM
ori updated the task description for T310880: Post-creation work for pcmwiki.
Sep 2 2022, 2:10 PM · Wiki-Setup
ori committed rGRBD6d707c562a45: Add pcmwiki to RESTBase (authored by ori).
Add pcmwiki to RESTBase
Sep 2 2022, 1:29 PM
Joe awarded T316706: Run user-submitted code under gVisor a Love token.
Sep 2 2022, 5:15 AM · function-evaluator, Abstract Wikipedia team (Phase θ – Throttling)

Sep 1 2022

ori renamed T239609: The N'Ko language cannot be looked up by its English name in the languages search box on Mobile web from The N'Ko language cannot be looked up by it's English name in the languages search box on Mobile web to The N'Ko language cannot be looked up by its English name in the languages search box on Mobile web.
Sep 1 2022, 8:39 PM · Readers-Web-Backlog, MobileFrontend
ori added a comment to T316886: Internal server error when calling function on NLG types.

OK, it looks like the default User-Agent string sent by node-fetch is blocked by Varnish:
https://github.com/wikimedia/puppet/blob/9843300dba/modules/varnish/templates/wikimedia-frontend.vcl.erb#L716-L718
We need to set a custom user-agent string for the orchestrator.

Sep 1 2022, 7:26 PM · Abstract Wikipedia team (Phase θ – Throttling)
ori added a comment to T316886: Internal server error when calling function on NLG types.

Ok, I hacked in some debugging code to include the HTML body in the response, and it looks like the orchestrator is getting an error page with the message:

Sep 1 2022, 6:05 PM · Abstract Wikipedia team (Phase θ – Throttling)
ori added a comment to T316886: Internal server error when calling function on NLG types.

It seems that the orchestrator is getting an invalid response from the MediaWiki API:

Sep 1 2022, 5:36 PM · Abstract Wikipedia team (Phase θ – Throttling)
ori added a comment to T268678: How do we expose content in each language to readers and search engines.

When a page on a wiki is updated, MediaWiki sends purge requests to the CDN layer to invalidate objects in the cache. Currently, this is URL-based. So, for example, if I got edit the article on 'Science' on enwiki, MediaWiki will send purge requests to Varnish for the following URLs:

Sep 1 2022, 4:48 PM · Abstract Wikipedia team (Phase ι – Documentation)
ori reopened T315019: HTTP 500 errors from Beta Cluster Wikifunctions health-check API endpoint as "Open".

We're seeing errors again.

Sep 1 2022, 4:15 PM · MW-1.39-notes (1.39.0-wmf.27; 2022-08-29), Abstract Wikipedia team (Phase θ – Throttling)
ori updated the task description for T316879: Make gVisor packages available via apt.wikimedia.org.
Sep 1 2022, 3:50 PM · Patch-For-Review, Infrastructure-Foundations, serviceops
ori created T316879: Make gVisor packages available via apt.wikimedia.org.
Sep 1 2022, 3:50 PM · Patch-For-Review, Infrastructure-Foundations, serviceops

Aug 30 2022

ori added a project to T316706: Run user-submitted code under gVisor: Abstract Wikipedia team.
Aug 30 2022, 7:35 PM · function-evaluator, Abstract Wikipedia team (Phase θ – Throttling)
ori created T316706: Run user-submitted code under gVisor.
Aug 30 2022, 7:34 PM · function-evaluator, Abstract Wikipedia team (Phase θ – Throttling)
ori committed rMSFO183e58d8484d: Centralize access to a shared logger object (authored by ori).
Centralize access to a shared logger object
Aug 30 2022, 5:23 PM
ori closed T138093: Investigate query parameter normalization for MW/services as Resolved.

This is now rolled for text frontends.

Aug 30 2022, 4:07 PM · MW-1.39-notes (1.39.0-wmf.25; 2022-08-15), Patch-For-Review, Traffic-Icebox, Platform Team Legacy (Watching / External), Services (watching), SRE, MediaWiki-General
ori updated the task description for T314868: Roll out query parameter normalization.
Aug 30 2022, 3:52 PM · MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), Patch-For-Review, Traffic, SRE, MediaWiki-General
ori added a comment to T314868: Roll out query parameter normalization.

This is now complete. Many thanks to @Vgutierrez for partnering with me to get this rolled out.

Aug 30 2022, 2:51 PM · MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), Patch-For-Review, Traffic, SRE, MediaWiki-General
ori closed T314868: Roll out query parameter normalization, a subtask of T138093: Investigate query parameter normalization for MW/services, as Resolved.
Aug 30 2022, 2:35 PM · MW-1.39-notes (1.39.0-wmf.25; 2022-08-15), Patch-For-Review, Traffic-Icebox, Platform Team Legacy (Watching / External), Services (watching), SRE, MediaWiki-General
ori closed T314868: Roll out query parameter normalization as Resolved.
Aug 30 2022, 2:34 PM · MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), Patch-For-Review, Traffic, SRE, MediaWiki-General

Aug 28 2022

ori added a comment to T315398: Set MW appserver scaling_governor to performance.

I tried setting EPP to 0 using x86_energy_perf_policy, thinking that bypassing the sysfs interface and writing directly to the MSR would make the setting sticky. Unfortunately this does not seem to be the case -- the EPP is gradually reset to 128, same as when you tried changing it via sysfs. At this point I also don't see value in further experimentation with the EPP knob and agree that performance is the way to go.

Aug 28 2022, 9:14 PM · Performance-Team (Radar), SRE

Aug 26 2022

ori added a comment to T315398: Set MW appserver scaling_governor to performance.

Actually, let me not step on your toes. But if you can tolerate a short extension of this task, I would very much like to see this setting tested. I think there is a good chance it will give the same or very similar performance increase with less waste of power. Just to be fully explicit, the setting is:

Aug 26 2022, 11:12 AM · Performance-Team (Radar), SRE
ori added a comment to T315398: Set MW appserver scaling_governor to performance.

So 'powersave' with EPP=0 gives a broader range of operating frequencies than 'performance'. We should see if in this mode the frequency scaling is still responsive enough for the workload.

Aug 26 2022, 7:33 AM · Performance-Team (Radar), SRE

Aug 22 2022

ori updated the task description for T314868: Roll out query parameter normalization.
Aug 22 2022, 9:55 AM · MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), Patch-For-Review, Traffic, SRE, MediaWiki-General

Aug 20 2022

ori added a comment to T290700: Use a Proper Logging Module in Orchestrator.

Unfortunately continuation-local-storage and its more modern counterpart, AsyncLocalStorage come with a substantial performance cost, particularly for workloads with a lot of async/await calls. I don't think we can afford the performance penalty.

Aug 20 2022, 1:58 AM · Patch-For-Review, Abstract Wikipedia Fix-It tasks, Abstract Wikipedia team (Phase θ – Throttling), function-orchestrator

Aug 19 2022

ori added a comment to T252719: Upgrade thumbor to Thumbor 7 and python3.

@Vlad.shapik thank you, but what about the other points I raised?

Aug 19 2022, 3:23 PM · Patch-For-Review, Thumbor Migration, Python3-Porting
ori added a parent task for T240685: MediaWiki Prometheus support: T315403: Framework for running experiments on a subset of the app server fleet.
Aug 19 2022, 2:25 PM · MW-1.40-notes (1.40.0-wmf.27; 2023-03-13), MW-1.38-notes (1.38.0-wmf.19; 2022-01-24), MediaWiki-libs-Stats, Platform Team Workboards (External Code Reviews), Patch-For-Review, serviceops, SRE, MediaWiki-General, observability
ori added a subtask for T315403: Framework for running experiments on a subset of the app server fleet: T240685: MediaWiki Prometheus support.
Aug 19 2022, 2:25 PM · serviceops, Performance-Team (Radar), SRE, Observability-Logging, Observability-Metrics
ori awarded T314453: Switch ChronologyProtector from redis to memcached a Doubloon token.
Aug 19 2022, 2:20 PM · serviceops, Performance-Team

Aug 18 2022

ori added a comment to T315398: Set MW appserver scaling_governor to performance.

Congratulations, this is a huge win! I think we should dig deeper to see if we can get the same or similar performance benefit, but waste less power.

Aug 18 2022, 8:21 PM · Performance-Team (Radar), SRE
ori added a comment to T252719: Upgrade thumbor to Thumbor 7 and python3.

Thanks @roman-stolar. I think it's a mistake to combine (a) changes from (multiple?) upstream(s), (b) unmerged changes from Gerrit, and (c) your own work into a single commit, as in Icabc39dab. This destroys some useful history (for example, the authorship, review comments and discussion on Id6ec6d62c), and it makes future reconciliation with upstream code harder. It's also error-prone.

Aug 18 2022, 3:40 PM · Patch-For-Review, Thumbor Migration, Python3-Porting
ori added a comment to T252719: Upgrade thumbor to Thumbor 7 and python3.

@Vlad.shapik, @roman-stolar, ping on the above :)
I'd also like to understand the deployment plan for this. Are you working with anyone on Wikimedia SRE to get this deployed? I strongly recommend deploying small, incremental updates rather than accumulating a lot of changes.

Aug 18 2022, 2:42 PM · Patch-For-Review, Thumbor Migration, Python3-Porting
lmata awarded T315403: Framework for running experiments on a subset of the app server fleet a Like token.
Aug 18 2022, 12:40 PM · serviceops, Performance-Team (Radar), SRE, Observability-Logging, Observability-Metrics

Aug 17 2022

ori updated the task description for T314868: Roll out query parameter normalization.
Aug 17 2022, 4:22 PM · MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), Patch-For-Review, Traffic, SRE, MediaWiki-General
ori added a comment to T315398: Set MW appserver scaling_governor to performance.

Based on https://www.kernel.org/doc/html/v5.6/admin-guide/pm/intel_pstate.html#operation-modes the scaling behavior will be different for systems depending on whether or not hardware-managed P-states (HWP) support is available and enabled. It looks like it is not available on 56 out of 265 app servers: P32411.

Aug 17 2022, 5:53 AM · Performance-Team (Radar), SRE
ori created P32411 (An Untitled Masterwork).
Aug 17 2022, 5:51 AM
ori updated subscribers of T315403: Framework for running experiments on a subset of the app server fleet.
Aug 17 2022, 5:04 AM · serviceops, Performance-Team (Radar), SRE, Observability-Logging, Observability-Metrics
ori updated the task description for T315403: Framework for running experiments on a subset of the app server fleet.
Aug 17 2022, 5:02 AM · serviceops, Performance-Team (Radar), SRE, Observability-Logging, Observability-Metrics
ori updated the task description for T315403: Framework for running experiments on a subset of the app server fleet.
Aug 17 2022, 5:01 AM · serviceops, Performance-Team (Radar), SRE, Observability-Logging, Observability-Metrics
ori created T315403: Framework for running experiments on a subset of the app server fleet.
Aug 17 2022, 5:01 AM · serviceops, Performance-Team (Radar), SRE, Observability-Logging, Observability-Metrics
ori added a comment to T315398: Set MW appserver scaling_governor to performance.

I propose making this change on all eqiad appservers in soft state, with cumin. Our latency metrics are noisy so changing it everywhere at once will give us the best chance of measuring a benefit.

Aug 17 2022, 2:29 AM · Performance-Team (Radar), SRE
ori added a comment to T315350: Beta cluster Error: 502, Next Hop Connection Failed.

Follow-up items to get the Puppet repo on deployment-puppetmaster04 in good shape:

Aug 17 2022, 12:06 AM · User-Ryasmeen, Beta-Cluster-Infrastructure, Release-Engineering-Team, Wikimedia-Incident, SRE-OnFire

Aug 16 2022

ori updated subscribers of T315379: (Beta cluster) Running logspam-watch on deployment-mwlog01 gives repeated `Use of uninitialized value $host` errors.

The Puppet repo on deployment-puppetmaster04:/var/lib/git/operations/puppet is in MERGING state. There's an unresolved conflict in modules/profile/manifests/etcd/v3.pp. The conflict is between the upstream change I04aa7729e and a local patch, Iecfc26a94, which has been cherry-picked locally for the past year but never merged upstream.

Aug 16 2022, 10:56 PM · SRE-OnFire, Sustainability (Incident Followup), Beta-Cluster-Infrastructure
ori created P32410 Beta Cluster puppetmaster state on 2022-08-16.
Aug 16 2022, 10:54 PM
ori updated the task description for T314868: Roll out query parameter normalization.
Aug 16 2022, 8:31 PM · MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), Patch-For-Review, Traffic, SRE, MediaWiki-General
ori added a comment to T252719: Upgrade thumbor to Thumbor 7 and python3.

Hi, I'm trying to understand https://gerrit.wikimedia.org/r/c/operations/software/thumbor-plugins/+/800170/ a bit better.

  • What parts of this change are coming from upstream and what parts are new?
  • How did https://gerrit.wikimedia.org/r/c/operations/software/thumbor-plugins/+/489022/6 ended up getting included, when it has not (AFAICT) been merged?
  • Did the new code get thoroughly reviewed? I only looked at one file, tests/integration/test_swift.py, and it looks like the change made at least some of the test code unreachable -- adding assert False to mock_get_object() does not result in a test failure.
Aug 16 2022, 5:54 PM · Patch-For-Review, Thumbor Migration, Python3-Porting
ori updated the task description for T314868: Roll out query parameter normalization.
Aug 16 2022, 1:32 PM · MW-1.39-notes (1.39.0-wmf.23; 2022-08-01), Patch-For-Review, Traffic, SRE, MediaWiki-General

Aug 15 2022

ori assigned T285312: Enable Logging in Backend Services to maryyang.
Aug 15 2022, 7:38 PM · Abstract Wikipedia team (Phase θ – Throttling), function-evaluator, function-orchestrator
ori closed T307722: Define SLIs and SLOs for function-* services as Resolved.

Published to https://wikitech.wikimedia.org/wiki/Wikifunctions/Performance_observability.

Aug 15 2022, 7:00 PM · Abstract Wikipedia team (Phase θ – Throttling), serviceops-radar, 2022 Wikimedia Google.org Fellowship, function-evaluator, function-orchestrator
ori updated subscribers of T310728: Emit a log from the orchestrator and evaluator on request result.

@maryyang are you able to take this on as part of the work on logging?

Aug 15 2022, 6:14 PM · Abstract Wikipedia team (Phase θ – Throttling), function-evaluator, function-orchestrator
ori added a comment to T185233: Modern Event Platform.

EventLogging is home grown, and was not designed for purposes other than low volume analytics in MySQL databases.

Aug 15 2022, 1:40 AM · Data-Engineering-Planning, Platform Team Workboards (Initiatives), Platform Team Initiatives (Modern Event Platform (TEC2)), Goal, Services (watching), MediaWiki-extensions-EventLogging, Event-Platform Value Stream, Analytics-Kanban

Aug 14 2022

ori created P32381 [[User:DragonflySixtyseven/Orphaned images]] at rev 1035233715.
Aug 14 2022, 4:45 PM

Aug 12 2022

ori added a comment to T315056: arclamp_generate_svgs OOMs.

The alerts are going to #wikimedia-operations; there were 21 alerts of this form on 2022-08-11:

Aug 12 2022, 1:52 PM · Performance-Team, observability, Arc-Lamp
ori created T315056: arclamp_generate_svgs OOMs.
Aug 12 2022, 5:00 AM · Performance-Team, observability, Arc-Lamp
ori added a comment to T293585: [epic] The SSL certificate for Beta cluster domains fails to properly renew & deploy.

We got alerts about the Beta Cluster cert being close to expiry (T311457#8147086) so I again ran:

Aug 12 2022, 12:56 AM · User-AKlapper, Quality-and-Test-Engineering-Team (QTE), Epic, SRE, Traffic, HTTPS, Beta-Cluster-Infrastructure

Aug 11 2022

ori created T315027: Reduce the performance cost of object validation.
Aug 11 2022, 4:16 PM · Abstract Wikipedia team, Performance Issue, function-orchestrator
ori added a comment to T315019: HTTP 500 errors from Beta Cluster Wikifunctions health-check API endpoint.

I see this error occur every few seconds in the function-orchestrator log on deployment-docker-wikifunctions01:

Aug 11 2022, 3:56 PM · MW-1.39-notes (1.39.0-wmf.27; 2022-08-29), Abstract Wikipedia team (Phase θ – Throttling)
ori created T315019: HTTP 500 errors from Beta Cluster Wikifunctions health-check API endpoint.
Aug 11 2022, 3:45 PM · MW-1.39-notes (1.39.0-wmf.27; 2022-08-29), Abstract Wikipedia team (Phase θ – Throttling)
ori added a comment to T138093: Investigate query parameter normalization for MW/services.

I implemented option 3, and created T314868 for tracking the roll-out.

Aug 11 2022, 3:21 AM · MW-1.39-notes (1.39.0-wmf.25; 2022-08-15), Patch-For-Review, Traffic-Icebox, Platform Team Legacy (Watching / External), Services (watching), SRE, MediaWiki-General