Page MenuHomePhabricator

EBernhardson (EBernhardson)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Oct 7 2014, 4:49 PM (489 w, 3 d)
Availability
Available
LDAP User
EBernhardson
MediaWiki User
EBernhardson (WMF) [ Global Accounts ]

Recent Activity

Yesterday

EBernhardson added a comment to T356303: Review wikitech:Search and write processes for k8s world.

Yesterday on IRC the question was raised:

this is probably the wrong way around, but i have a python script that uses helmfile apply --set ... to deploy a special backfilling release that is not part of the normal release process. This release runs to completion, but the related custom operator (flink) only understands things that run forever, so my python script also does a helm destroy to clean up afterwards.
I guess my question is, is there a reasonable way to ensure i'm deleting the thing i think i'm deleting? I was considering perhaps adjusting the chart so i can provide a backfill_id label with --set and then use that id in a selector when destroy'ing

From what I understood (and please correct my if I'm wrong! :)) the process is as follows:

  • You deploy a separate helmfile release "...-backfill" that creates a separate FlinkDeployment which launches a job that runs to completion (may take a long time, though)
  • The jobmanager Pod than keeps lingering around (blocking resources, 500m CPU, 100Mi Memory) because the flink-operator configures SHUTDOWN_ON_APPLICATION_FINISH=false in any case to for internal reasons
  • You destroy the helmfile release to clean up the jobmanager (by removing the FlinkDeployment object)

One question that comes to mind immediately, and I might be completely off here: Isn't this what a Flink session cluster is for? Having just one Jobmanager that controls multiple Jobs (e.g. the generic one plus backfill) that can be submitted at runtime?

Fri, Feb 23, 10:31 PM · Patch-For-Review, Documentation, Data-Platform-SRE ( 2024.02.12 - 2024.03.03), Discovery-Search (Current work)

Thu, Feb 22

EBernhardson added a comment to T356303: Review wikitech:Search and write processes for k8s world.

To review the documentation changes (there are also two revisions from bking mixed in there): https://wikitech.wikimedia.org/w/index.php?title=Search&diff=2153071&oldid=2127290

Thu, Feb 22, 11:31 PM · Patch-For-Review, Documentation, Data-Platform-SRE ( 2024.02.12 - 2024.03.03), Discovery-Search (Current work)
EBernhardson added a comment to T356303: Review wikitech:Search and write processes for k8s world.

Example query of the rest api (could be nicer if we installed curl or wget, or exposed the rest api directly):

KUBECONFIG=/etc/kubernetes/cirrus-streaming-updater-deploy-staging.config kubectl \>
   exec \
   flink-app-consumer-search-backfill-5b9f979487-dsqsb \
   -c flink-main-container \
   -- \
   python3 -c 'import urllib.request; print(urllib.request.urlopen("http://localhost:8081/v1/jobs").read().decode("utf8"))'
Thu, Feb 22, 5:59 PM · Patch-For-Review, Documentation, Data-Platform-SRE ( 2024.02.12 - 2024.03.03), Discovery-Search (Current work)
EBernhardson added a comment to T356303: Review wikitech:Search and write processes for k8s world.

On further review, simply documenting the various commands to run seemed error prone. Attached patch adds a python script that simplifies away most of the reindexing and backfill to ease future burden.

Thu, Feb 22, 12:25 AM · Patch-For-Review, Documentation, Data-Platform-SRE ( 2024.02.12 - 2024.03.03), Discovery-Search (Current work)

Thu, Feb 15

EBernhardson added a comment to T356526: High level of backend errors for CirrusSearch jobs in jobrunners.

Was supposed to be in the backport window today, but train problems blocked that. This is a pretty safe patch though, i'll ship it a little later.

Thu, Feb 15, 10:36 PM · MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), serviceops, Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T356526: High level of backend errors for CirrusSearch jobs in jobrunners.

It seems the patch didn't actually make it into wmf.18 as expected, jenkins-bot never finished the merge so this was only deployed in wmf.17. I'll get it shipped there too.

Thu, Feb 15, 7:01 PM · MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), serviceops, Discovery-Search (Current work), CirrusSearch

Wed, Feb 14

EBernhardson added a comment to T356303: Review wikitech:Search and write processes for k8s world.

I've been reviewing our options for backfilling and trying to come up with a plan, i think the following will work:

Wed, Feb 14, 10:34 PM · Patch-For-Review, Documentation, Data-Platform-SRE ( 2024.02.12 - 2024.03.03), Discovery-Search (Current work)
EBernhardson moved T356655: Create tool and process to investigate Search update Pipeline failures from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Wed, Feb 14, 6:45 PM · Discovery-Search (Current work)
EBernhardson moved T356526: High level of backend errors for CirrusSearch jobs in jobrunners from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.

This looks resolved now, the bi-hourly spikes have gone away since the monday deployment.

Wed, Feb 14, 5:31 PM · MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), serviceops, Discovery-Search (Current work), CirrusSearch

Mon, Feb 12

EBernhardson moved T357353: Application Security Review Request : NetworkSession MediaWiki extension from Incoming to Blocked/Waiting on the Discovery-Search (Current work) board.
Mon, Feb 12, 10:02 PM · Discovery-Search (Current work), secscrum, Security, Application Security Reviews
EBernhardson added a project to T357353: Application Security Review Request : NetworkSession MediaWiki extension : Discovery-Search (Current work).
Mon, Feb 12, 10:02 PM · Discovery-Search (Current work), secscrum, Security, Application Security Reviews
EBernhardson added a subtask for T355267: Add extension NetworkSession to all wmf wikis: T357353: Application Security Review Request : NetworkSession MediaWiki extension .
Mon, Feb 12, 9:55 PM · Discovery-Search (Current work), Wikimedia-extension-review-queue, Wikimedia-Extension-setup
EBernhardson added a parent task for T357353: Application Security Review Request : NetworkSession MediaWiki extension : T355267: Add extension NetworkSession to all wmf wikis.
Mon, Feb 12, 9:55 PM · Discovery-Search (Current work), secscrum, Security, Application Security Reviews
EBernhardson created T357353: Application Security Review Request : NetworkSession MediaWiki extension .
Mon, Feb 12, 9:54 PM · Discovery-Search (Current work), secscrum, Security, Application Security Reviews
EBernhardson created P56688 (An Untitled Masterwork).
Mon, Feb 12, 9:33 PM
EBernhardson moved T354976: Create new NetworkSession mediawiki extension from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.

Updated the mw.org page with the latest changes, so it's now inline with the repository. I think this is enough to call this ticket complete. T355267 is the task for deploying this extension to the wikis.

Mon, Feb 12, 9:20 PM · MW-Interfaces-Team, serviceops-radar, NetworkSession, Patch-For-Review, API Platform, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch
EBernhardson added a subtask for T345185: Provide a method for internal services to run api requests for private wikis: T355267: Add extension NetworkSession to all wmf wikis.
Mon, Feb 12, 9:19 PM · MW-1.42-notes (1.42.0-wmf.16; 2024-01-30), API Platform, serviceops, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch
EBernhardson added a parent task for T355267: Add extension NetworkSession to all wmf wikis: T345185: Provide a method for internal services to run api requests for private wikis.
Mon, Feb 12, 9:19 PM · Discovery-Search (Current work), Wikimedia-extension-review-queue, Wikimedia-Extension-setup
EBernhardson added a comment to T356651: Rebuild and deploy textify plugin.

Released the plugin as -wmf12. Patch above updates the .deb to use the newest versions. MR also up on gitlab to update the dev image (for cindy/dev envs) to use the new version once available.

Mon, Feb 12, 8:37 PM · Data-Platform-SRE ( 2024.02.12 - 2024.03.03), Discovery-Search (Current work)
EBernhardson claimed T356439: [Tracking] Evaluate differences in saneitizer fixes eqiad vs cloudelastic.
Mon, Feb 12, 4:16 PM · Discovery-Search (Current work)

Fri, Feb 9

EBernhardson added a comment to T356526: High level of backend errors for CirrusSearch jobs in jobrunners.

If we need them silenced, best bet is probably to re-enable the writes for these wikis. Can be done with a mediawiki-config patch.

Fri, Feb 9, 5:49 PM · MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), serviceops, Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T356526: High level of backend errors for CirrusSearch jobs in jobrunners.

I haven't managed to track down where the Received cirrusSearchElasticaWrite job for unwritable cluster cloudelastic error comes from. We recently turned off writes to this cluster from mediawiki on select wikis, but somewhere in the codebase is still trying to create writes even though it shouldn't. Needs more invetigation on our side.

Fri, Feb 9, 4:05 PM · MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), serviceops, Discovery-Search (Current work), CirrusSearch

Thu, Feb 8

EBernhardson claimed T356655: Create tool and process to investigate Search update Pipeline failures.
Thu, Feb 8, 11:07 PM · Discovery-Search (Current work)

Wed, Feb 7

EBernhardson created P56486 (An Untitled Masterwork).
Wed, Feb 7, 10:43 PM
EBernhardson updated the task description for T356933: Streaming Updater should still make forward progress when one index has problems.
Wed, Feb 7, 10:29 PM · Discovery-Search (Current work)
EBernhardson created T356933: Streaming Updater should still make forward progress when one index has problems.
Wed, Feb 7, 10:26 PM · Discovery-Search (Current work)
EBernhardson moved T354976: Create new NetworkSession mediawiki extension from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Wed, Feb 7, 4:57 PM · MW-Interfaces-Team, serviceops-radar, NetworkSession, Patch-For-Review, API Platform, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch

Mon, Feb 5

EBernhardson added a comment to T356655: Create tool and process to investigate Search update Pipeline failures.

Current process (to be refined). None of this is committed anywhere yet, mostly working out what is going to work.

Mon, Feb 5, 10:12 PM · Discovery-Search (Current work)
EBernhardson added a comment to T356439: [Tracking] Evaluate differences in saneitizer fixes eqiad vs cloudelastic.
Mon, Feb 5, 10:11 PM · Discovery-Search (Current work)
EBernhardson added a comment to T356655: Create tool and process to investigate Search update Pipeline failures.

Idea is something like:

Mon, Feb 5, 5:12 PM · Discovery-Search (Current work)
EBernhardson moved T356526: High level of backend errors for CirrusSearch jobs in jobrunners from Incoming to To Be Deployed on the Discovery-Search (Current work) board.

This is a bit of a non error. What happened is:

Mon, Feb 5, 4:35 PM · MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), serviceops, Discovery-Search (Current work), CirrusSearch
EBernhardson removed a project from T356302: setup production Cirrus Streaming Updater alerts : Epic.
Mon, Feb 5, 4:12 PM · Discovery-Search (Current work)

Thu, Feb 1

EBernhardson added a comment to T356439: [Tracking] Evaluate differences in saneitizer fixes eqiad vs cloudelastic.

Started with the ghost page in index errors, since there are only a couple. We have two pages in cloudelastic for frwiki that have been correctly deleted in eqiad but still exist in cloudelastic:

Thu, Feb 1, 11:44 PM · Discovery-Search (Current work)
EBernhardson added a project to T356439: [Tracking] Evaluate differences in saneitizer fixes eqiad vs cloudelastic: Discovery-Search (Current work).
Thu, Feb 1, 8:18 PM · Discovery-Search (Current work)
EBernhardson created T356439: [Tracking] Evaluate differences in saneitizer fixes eqiad vs cloudelastic.
Thu, Feb 1, 8:18 PM · Discovery-Search (Current work)
EBernhardson updated the task description for T356438: Add NetworkSession mediawiki extension to translatewiki.net.
Thu, Feb 1, 8:08 PM · translatewiki.net
EBernhardson added a comment to T354976: Create new NetworkSession mediawiki extension.

Localization - The only localization is the extension description, unclear if necessary (or how).

Thu, Feb 1, 8:08 PM · MW-Interfaces-Team, serviceops-radar, NetworkSession, Patch-For-Review, API Platform, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch
EBernhardson added a subtask for T354976: Create new NetworkSession mediawiki extension: T356438: Add NetworkSession mediawiki extension to translatewiki.net.
Thu, Feb 1, 8:07 PM · MW-Interfaces-Team, serviceops-radar, NetworkSession, Patch-For-Review, API Platform, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch
EBernhardson added a parent task for T356438: Add NetworkSession mediawiki extension to translatewiki.net: T354976: Create new NetworkSession mediawiki extension.
Thu, Feb 1, 8:07 PM · translatewiki.net
EBernhardson created T356438: Add NetworkSession mediawiki extension to translatewiki.net.
Thu, Feb 1, 8:06 PM · translatewiki.net
EBernhardson moved T352335: Deploy the new Cirrus Updater to update select wikis in cloudelastic from Blocked/Waiting to Needs Reporting on the Discovery-Search (Current work) board.

The selected set of wikis has been enabled in production and are performing writes. Issues resulting from this deployment will be delt with in separate tickets.

Thu, Feb 1, 7:54 PM · Discovery-Search (Current work)
EBernhardson added a comment to T252591: REST API endpoints give confusing errors for invalid OAuth2 access tokens.

It seems like the problem here is code that runs prior to endpoint specific code needs a generic way to inform the output layer that what error has occured. Today what we are doing is registering hook handlers for each specific output layer, and throwing different output specific exceptions.

Thu, Feb 1, 5:47 PM · MW-Interfaces-Team, Patch-Needs-Improvement, MediaWiki-REST-API, Platform Team Workboards (Clinic Duty Team), Platform Team Initiatives (MW REST API in PHP), MediaWiki-extensions-OAuth

Wed, Jan 31

EBernhardson added a comment to T356303: Review wikitech:Search and write processes for k8s world.

First pass review of the administration processes listed on wikitech and which will be changing. This started as only about the streaming updater, but added a second section out outdated topics. perhaps another ticket?

Wed, Jan 31, 10:55 PM · Patch-For-Review, Documentation, Data-Platform-SRE ( 2024.02.12 - 2024.03.03), Discovery-Search (Current work)
EBernhardson created T356303: Review wikitech:Search and write processes for k8s world.
Wed, Jan 31, 5:40 PM · Patch-For-Review, Documentation, Data-Platform-SRE ( 2024.02.12 - 2024.03.03), Discovery-Search (Current work)
EBernhardson created T356302: setup production Cirrus Streaming Updater alerts .
Wed, Jan 31, 5:39 PM · Discovery-Search (Current work)

Tue, Jan 30

EBernhardson moved T350186: Cirrus-streaming-updater test: validate relforge indices are correctly updated from Ready for Dev -- SWE to Needs Reporting on the Discovery-Search (Current work) board.
Tue, Jan 30, 4:50 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work)
EBernhardson moved T350186: Cirrus-streaming-updater test: validate relforge indices are correctly updated from Blocked / Waiting to Done on the Data-Platform-SRE (2024.01.22 - 2024.02.11) board.

We followed this data over time and it seemed to stay in line. We've now progressed from relforge to a cloudelastic deployment and we can probably consider this complete.

Tue, Jan 30, 4:50 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work)

Mon, Jan 29

EBernhardson moved T352915: Do not display <languages /> content as search excerpt from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mon, Jan 29, 5:47 PM · Localization Infrastructure FY2023-24, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), Discovery-Search (Current work), MediaWiki-extensions-Translate, MediaWiki-Parser, CirrusSearch
EBernhardson moved T354793: SUP: Adapt saneitizer to allow SUP to operate next to cirrus jobs from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mon, Jan 29, 5:38 PM · MW-1.42-notes (1.42.0-wmf.17; 2024-02-06), Discovery-Search (Current work), CirrusSearch

Jan 24 2024

EBernhardson awarded T355619: Request MediaWiki +2 for Paladox a Like token.
Jan 24 2024, 11:43 PM · MediaWiki-Gerrit-Group-Requests
EBernhardson moved T355066: SUP: Process (large) JSON responses non-blocking to save memory from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Jan 24 2024, 5:03 PM · Patch-For-Review, Discovery-Search (Current work), CirrusSearch

Jan 23 2024

EBernhardson moved T353427: ConsumerApplicationIT should fail when the update request payload changed from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Jan 23 2024, 9:09 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson moved T354197: SUP: deployment: allow passing integers from Needs review to Needs Reporting on the Discovery-Search (Current work) board.

The patch documents changing app.config_files.app\.config\.yaml via the command line which hopefully does as we need. It allows changing the same values, and avoids the problem described in the ticket since the passed arg never needs to be included in app.job.args, and thus doesn't need to pass the is-string check.

Jan 23 2024, 9:00 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson closed T355236: SUP: Provide config option for cirrussearch to partially disable writing to elasticsearch as Invalid.

I believe removing cloudelastic from the list of clusters to write to should be sufficient. mediawiki-config has the appropriate bits to do this on a per-wiki basis.

Jan 23 2024, 6:17 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson closed T355236: SUP: Provide config option for cirrussearch to partially disable writing to elasticsearch, a subtask of T354595: SUP: Production , as Invalid.
Jan 23 2024, 6:17 PM · Data-Platform-SRE ( 2024.02.12 - 2024.03.03), Discovery-Search (Current work), CirrusSearch
EBernhardson moved T353427: ConsumerApplicationIT should fail when the update request payload changed from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Jan 23 2024, 6:14 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T345570: [L] Generate empty datasets when image suggestions time out.

The idea behind the empty dataset is that airflow looks at the hive metadata to see if it's ready for processing. The idea would be to add a partition to the hive metadata that points at nothing. Something like the following:

Jan 23 2024, 4:17 PM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions, Image-Suggestions

Jan 22 2024

EBernhardson moved T353427: ConsumerApplicationIT should fail when the update request payload changed from In Progress to Needs review on the Discovery-Search (Current work) board.
Jan 22 2024, 10:53 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson moved T354793: SUP: Adapt saneitizer to allow SUP to operate next to cirrus jobs from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Jan 22 2024, 6:21 PM · MW-1.42-notes (1.42.0-wmf.17; 2024-02-06), Discovery-Search (Current work), CirrusSearch
EBernhardson moved T352915: Do not display <languages /> content as search excerpt from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Jan 22 2024, 4:04 PM · Localization Infrastructure FY2023-24, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), Discovery-Search (Current work), MediaWiki-extensions-Translate, MediaWiki-Parser, CirrusSearch

Jan 18 2024

EBernhardson claimed T353427: ConsumerApplicationIT should fail when the update request payload changed.
Jan 18 2024, 9:43 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T352915: Do not display <languages /> content as search excerpt.

Note that once deployed this will not instantly fix the pages. The pages will be fixed on the next edit, or when the background reindexer gets to the page (once every ~16 weeks).

Jan 18 2024, 9:42 PM · Localization Infrastructure FY2023-24, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), Discovery-Search (Current work), MediaWiki-extensions-Translate, MediaWiki-Parser, CirrusSearch
EBernhardson claimed T352915: Do not display <languages /> content as search excerpt.
Jan 18 2024, 9:38 PM · Localization Infrastructure FY2023-24, MW-1.42-notes (1.42.0-wmf.15; 2024-01-23), Discovery-Search (Current work), MediaWiki-extensions-Translate, MediaWiki-Parser, CirrusSearch
EBernhardson moved T354976: Create new NetworkSession mediawiki extension from In Progress to Needs review on the Discovery-Search (Current work) board.
Jan 18 2024, 8:47 PM · MW-Interfaces-Team, serviceops-radar, NetworkSession, Patch-For-Review, API Platform, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch
EBernhardson moved T345185: Provide a method for internal services to run api requests for private wikis from In Progress to Blocked/Waiting on the Discovery-Search (Current work) board.
Jan 18 2024, 8:47 PM · MW-1.42-notes (1.42.0-wmf.16; 2024-01-30), API Platform, serviceops, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch
EBernhardson moved T354976: Create new NetworkSession mediawiki extension from Incoming to In Progress on the Discovery-Search (Current work) board.
Jan 18 2024, 8:46 PM · MW-Interfaces-Team, serviceops-radar, NetworkSession, Patch-For-Review, API Platform, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch
EBernhardson added a comment to T345185: Provide a method for internal services to run api requests for private wikis.

The extension is now documented and written, but still needs to finish code review. Perhaps though it would be worth talking about what is the appropriate level of verification that should be applied to the requests. Some options, in order of increasing complexity:

Jan 18 2024, 5:58 PM · MW-1.42-notes (1.42.0-wmf.16; 2024-01-30), API Platform, serviceops, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch

Jan 17 2024

EBernhardson added a project to T355267: Add extension NetworkSession to all wmf wikis: Discovery-Search (Current work).
Jan 17 2024, 10:08 PM · Discovery-Search (Current work), Wikimedia-extension-review-queue, Wikimedia-Extension-setup
EBernhardson created T355267: Add extension NetworkSession to all wmf wikis.
Jan 17 2024, 10:08 PM · Discovery-Search (Current work), Wikimedia-extension-review-queue, Wikimedia-Extension-setup
EBernhardson added a comment to T354976: Create new NetworkSession mediawiki extension.

Went through https://www.mediawiki.org/wiki/Writing_an_extension_for_deployment to make sure we've done what's needed:

Jan 17 2024, 8:42 PM · MW-Interfaces-Team, serviceops-radar, NetworkSession, Patch-For-Review, API Platform, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch

Jan 12 2024

EBernhardson created T354976: Create new NetworkSession mediawiki extension.
Jan 12 2024, 10:16 PM · MW-Interfaces-Team, serviceops-radar, NetworkSession, Patch-For-Review, API Platform, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch
EBernhardson created T354973: Create project tag for NetworkSession MediaWiki extension.
Jan 12 2024, 10:02 PM · NetworkSession, Project-Admins

Jan 9 2024

EBernhardson added a comment to T345185: Provide a method for internal services to run api requests for private wikis.

After reviewing the option here, along with reviewing the current state of the mw k8s deployment, it looks like we can drop the requirement to execute via the job runner infrastructure. If that's the case, is running a pseudo-job still the best plan? Some considerations:

Jan 9 2024, 10:24 PM · MW-1.42-notes (1.42.0-wmf.16; 2024-01-30), API Platform, serviceops, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch

Jan 5 2024

EBernhardson claimed T345185: Provide a method for internal services to run api requests for private wikis.
Jan 5 2024, 5:32 PM · MW-1.42-notes (1.42.0-wmf.16; 2024-01-30), API Platform, serviceops, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch
EBernhardson updated the task description for T353460: The consumer job of the SUP does not achieve its expected throughput.
Jan 5 2024, 5:27 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson updated the task description for T353460: The consumer job of the SUP does not achieve its expected throughput.
Jan 5 2024, 5:26 PM · Discovery-Search (Current work), CirrusSearch

Dec 7 2023

EBernhardson added a comment to T350826: Test backfilling for cirrus-streaming-updater.

Another option we came up with was to backfill to a null flink sink, this would allow measuring capacity of the flink pipeline by itself, separate from the ability of the chosen elasticsearch cluster to consumer those updates.

Dec 7 2023, 5:34 PM · Discovery-Search (Current work), Data-Platform-SRE

Dec 6 2023

EBernhardson added a comment to T352906: mediawiki k8s jobrunner fails connecting to cloudelastic with a TLS error.

In my estimation an appropriate solution here is to move the cirrusCheckerJob back to the old job runners, and bring them back after solving the TLS issue.

Dec 6 2023, 11:02 PM · serviceops, Discovery-Search (Current work), MW-on-K8s
EBernhardson added a comment to T352906: mediawiki k8s jobrunner fails connecting to cloudelastic with a TLS error.

Cloudelastic uses acmechief for it's TLS certificates, vs most prod services which probably (?) have an internally signed certificate. It seems plausible that the problem has something to do with the certs coming from acmechief (not the certs themselves, but how envoy validates them).

Dec 6 2023, 9:25 PM · serviceops, Discovery-Search (Current work), MW-on-K8s
EBernhardson added a project to T352906: mediawiki k8s jobrunner fails connecting to cloudelastic with a TLS error: Discovery-Search (Current work).
Dec 6 2023, 9:23 PM · serviceops, Discovery-Search (Current work), MW-on-K8s
EBernhardson added a comment to T352906: mediawiki k8s jobrunner fails connecting to cloudelastic with a TLS error.

Combined with another bug that doesn't correctly recognize these failures has resulted in an increase of cirrusSearchLinksUpdate from 300-500/s to around 800/s

Dec 6 2023, 7:51 PM · serviceops, Discovery-Search (Current work), MW-on-K8s
EBernhardson added a comment to T352906: mediawiki k8s jobrunner fails connecting to cloudelastic with a TLS error.

For comparison envoy works fine from mwdeploy2002 itself:

deploy2002 $ mwscript shell.php testwiki
Psy Shell v0.11.21 (PHP 7.4.33 — cli) by Justin Hileman
> $ch = curl_init('http://localhost:6105')
= curl resource #1575
Dec 6 2023, 7:25 PM · serviceops, Discovery-Search (Current work), MW-on-K8s
EBernhardson updated the task description for T352906: mediawiki k8s jobrunner fails connecting to cloudelastic with a TLS error.
Dec 6 2023, 7:24 PM · serviceops, Discovery-Search (Current work), MW-on-K8s
EBernhardson created T352906: mediawiki k8s jobrunner fails connecting to cloudelastic with a TLS error.
Dec 6 2023, 7:23 PM · serviceops, Discovery-Search (Current work), MW-on-K8s

Dec 5 2023

EBernhardson added a comment to T350186: Cirrus-streaming-updater test: validate relforge indices are correctly updated.

I've run this a few times, it claims the indices in relforge match the ones in production. I'm still a bit suspicious that it passed on the first try, maybe could try harder to see what is broken. But we've done the testing and what we have so far claims to work.

Dec 5 2023, 4:55 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work)

Dec 4 2023

EBernhardson added a comment to T351503: Enable mediawiki.cirrussearch.page_rerender.v1 on all public wikis.

Current plan for gradual deploy is to start with a selection of wikis that add up to ~25% of the total rate. If that's too high we can remove commonswiki from the set, which should bring it down around ~13%. Before we can turn those events on I believe we need to have the topic partitioning changes applied, the topic currently has a single partition.

Dec 4 2023, 7:52 PM · Data-Platform-SRE (2024.01.01 - 2024.01.21), Discovery-Search (Current work), serviceops
EBernhardson added a comment to T352335: Deploy the new Cirrus Updater to update select wikis in cloudelastic.

It looks like we are estimating the page rerender events to be at approximately the same rate as the existing cirrusSearchLinksUpdate jobs. Some related stats, estimated from one week (nov 27-dec 3) of kafka history. This reuses the prior estimate of 3 copies of the data at 0.6kB per event with 7 days retention. I added the row about removing commons since it is the largest of the selected wikis, giving an option to reduce the initial rollout.

Dec 4 2023, 7:27 PM · Discovery-Search (Current work)
EBernhardson created P54127 cirrusSearchLinksUpdates jobs per-database for 2023-11-27 through 2023-12-04.
Dec 4 2023, 7:20 PM
EBernhardson added a comment to T352335: Deploy the new Cirrus Updater to update select wikis in cloudelastic.

Ideally we want a size estimate, pre-deploy, of the refresh topic. Primarily so we can verify the accuracy of our estimate of the total size once refreshes are enabled on all wikis.

Dec 4 2023, 4:14 PM · Discovery-Search (Current work)

Dec 1 2023

EBernhardson added a comment to T352534: [airflow] Inserting task notes is not working since upgrade to version 2.7.3.

We do use the feature, although tbh I don't know how useful it is. Sometimes we skip runs that failed because canary events didn't fire, or we have to re-run a task with some extra settings because something weird about that hour is blowing out memory limits, and we make a note of that in the dagrun. But i don't know we've ever looked at those notes afterwards.

Dec 1 2023, 4:50 PM · Data-Platform-SRE (2023.12.01 - 2023.12.31)
EBernhardson merged T352483: Can't save dagrun notes in airflow after 2.7.3 migration into T352534: [airflow] Inserting task notes is not working since upgrade to version 2.7.3.
Dec 1 2023, 4:36 PM · Data-Platform-SRE (2023.12.01 - 2023.12.31)
EBernhardson merged task T352483: Can't save dagrun notes in airflow after 2.7.3 migration into T352534: [airflow] Inserting task notes is not working since upgrade to version 2.7.3.
Dec 1 2023, 4:36 PM · Data-Platform-SRE, Data Pipelines

Nov 30 2023

EBernhardson moved T350299: EventBus change events involving redirect changes are sometimes incorrect from Needs review to Needs Reporting on the Discovery-Search (Current work) board.

Rerunning the reproduction steps from above, on edit from a redirect into a normal wikitext page we get the following, looks fixed:

"page":{"page_id":153275,"page_title":"Cirrus_Updater","namespace_id":0,"is_redirect":false}
Nov 30 2023, 9:57 PM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), MediaWiki-Page-derived-data, MediaWiki-Engineering, Discovery-Search (Current work)
EBernhardson claimed T352335: Deploy the new Cirrus Updater to update select wikis in cloudelastic.
Nov 30 2023, 9:09 PM · Discovery-Search (Current work)
EBernhardson created T352483: Can't save dagrun notes in airflow after 2.7.3 migration.
Nov 30 2023, 8:37 PM · Data-Platform-SRE, Data Pipelines
EBernhardson moved T352335: Deploy the new Cirrus Updater to update select wikis in cloudelastic from Incoming to Needs review on the Discovery-Search (Current work) board.
Nov 30 2023, 7:57 PM · Discovery-Search (Current work)
EBernhardson updated the task description for T352335: Deploy the new Cirrus Updater to update select wikis in cloudelastic.
Nov 30 2023, 7:55 PM · Discovery-Search (Current work)
EBernhardson added a comment to T352335: Deploy the new Cirrus Updater to update select wikis in cloudelastic.
Nov 30 2023, 7:55 PM · Discovery-Search (Current work)
EBernhardson renamed T352475: Grant Access to archiva-deployers for pfischer from Grant Access to archiva-deploy for pfischer to Grant Access to archiva-deployers for pfischer.
Nov 30 2023, 7:30 PM · SRE, LDAP-Access-Requests
EBernhardson updated the task description for T352475: Grant Access to archiva-deployers for pfischer.
Nov 30 2023, 7:29 PM · SRE, LDAP-Access-Requests