Page MenuHomePhabricator

EBernhardson (EBernhardson)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Oct 7 2014, 4:49 PM (498 w, 5 d)
Availability
Available
LDAP User
EBernhardson
MediaWiki User
EBernhardson (WMF) [ Global Accounts ]

Recent Activity

Fri, Apr 26

EBernhardson moved T361870: Stabilize "consumer-cloudelastic" Search Update Pipeline job from Incoming to Needs Reporting on the Discovery-Search (Current work) board.

The consumer seems generally stable. It involved changes to both the application for better error handling, and an increase in the taskmanager memory above. The pods had been running for a week uninterrupted until we brought them down yesterday to verify some new alerting.

Fri, Apr 26, 8:06 PM · Discovery-Search (Current work), Data-Platform-SRE (2024.04.15 - 2024.05.05), Patch-For-Review
EBernhardson added a comment to T359215: mediawiki_cirrussearch_request data is regularly late.

Poked at the data-engineering-alerts archive, it looks like these were firing daily and then stopped on Apr 10. I think we can optimistically call this fixed?

Fri, Apr 26, 7:54 PM · Performance Issue, Data-Platform
EBernhardson moved T359580: CirrusSearch should not send outdated cirrussearch-request events from Blocked/Waiting to Needs Reporting on the Discovery-Search (Current work) board.

per the data-engineering-alerts list archive these were triggering daily alerts the two weeks prior to 2024-04-10 and haven't been emitted since. This is two days after the fix was applied, which is slightly curious. But I remember something about event refining operating over window of hours, so maybe it took some time to pass. I'm willing to call this complete with the errors stopping.

Fri, Apr 26, 7:53 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T363521: Completion suggester can promote a bad build.

Root cause of the network issue has been tracked down in T363516#9748908, A layer-2 issue with LVS and new racks. With that fixed this error should be triggered less frequently, but we should still apply some resiliency updates to the related code.

Fri, Apr 26, 7:04 PM · serviceops-radar, CirrusSearch, Discovery-Search
EBernhardson added a comment to T363516: Many search suggestions missing when connecting to eqiad, but not when connecting to codfw.

decided to delay bringing traffic back to eqiad until monday. To be confident in the daily indices we would probably want to rebuild them all, but that takes many hours and it would finish only a few hours before I'm heading out for the weekend. Didn't seem like a great time to bring traffic back. The daily rebuilds will run, we can look at them an monday and bring traffic back if everything is back to normal.

Fri, Apr 26, 6:51 PM · CirrusSearch, Discovery-Search (Current work), Patch-For-Review
EBernhardson moved T359580: CirrusSearch should not send outdated cirrussearch-request events from Needs review to Blocked/Waiting on the Discovery-Search (Current work) board.
Fri, Apr 26, 6:49 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T359215: mediawiki_cirrussearch_request data is regularly late.

I poked around a little, but I'm not sure how to check if that fix solved the issue or not. I submitted a join request to the data-enginering-alerts mailing list, can check archives for current frequency after being accepted. I assume these alerts are also recorded by whatever sends them, but i wasn't sure where that is.

Fri, Apr 26, 6:48 PM · Performance Issue, Data-Platform
EBernhardson moved T357066: CirrusSearch\BuildDocument\BuildDocumentException: ParserOutput cannot be obtained. from Needs review to Needs Reporting on the Discovery-Search (Current work) board.

These look to have subsided, now 12 in the last 4 days.

Fri, Apr 26, 5:57 PM · MW-1.43-notes (1.43.0-wmf.2; 2024-04-23), Discovery-Search (Current work), User-brennen, CirrusSearch, Wikimedia-production-error
EBernhardson edited P61254 (An Untitled Masterwork).
Fri, Apr 26, 4:18 PM
EBernhardson created P61254 (An Untitled Masterwork).
Fri, Apr 26, 4:01 PM

Thu, Apr 25

EBernhardson added a comment to T363521: Completion suggester can promote a bad build.

One thing we do have in logstash, although not specifically from the script running eqiad, is a surprising (to me) number of general network errors talking to the elasticsearch cluster. Looking at the Host overview dashboard for mwmaint1002 for today can see that there were intermittent network errors from 03:00 until 06:50. Our completion indices build ran from 02:30 to 6:45. Looking at the last 7 days there are consistently network errors during this time period. I'm assuming we are causing those, but we could try running it at a different time of day.

Thu, Apr 25, 9:13 PM · serviceops-radar, CirrusSearch, Discovery-Search
EBernhardson added a comment to T358350: Search Metrics - Successful searches.

Started looking over this the other day. Some data we have available:

Thu, Apr 25, 8:29 PM · Discovery-Search (Current work)
EBernhardson added a comment to T363521: Completion suggester can promote a bad build.

Wrote a terrible bash script to compare titlesuggest doc counts between the two clusters. This suggests the problem isn't limited to enwiki

Thu, Apr 25, 8:27 PM · serviceops-radar, CirrusSearch, Discovery-Search
EBernhardson added a comment to T363516: Many search suggestions missing when connecting to eqiad, but not when connecting to codfw.

Decided against shuffling traffic, rebuild is almost compete already for enwiki. I can see in the logs where the enwiki eqiad build jumped from 44% to complete, but no reason why. nothing in logstash for that period either. I've created T363521 to put something in place to prevent this in the future.

Thu, Apr 25, 8:20 PM · CirrusSearch, Discovery-Search (Current work), Patch-For-Review
EBernhardson created T363521: Completion suggester can promote a bad build.
Thu, Apr 25, 7:48 PM · serviceops-radar, CirrusSearch, Discovery-Search
EBernhardson created P61225 mwmaint1002:mediawiki_job_cirrus_build_completion_indices_eqiad syslog for enwiki.
Thu, Apr 25, 7:47 PM
EBernhardson added a comment to T363516: Many search suggestions missing when connecting to eqiad, but not when connecting to codfw.

hmm, i can confirm this is happening. The completion index is built new every day in each datacenter. Usually they are the same, but somehow the eqiad index is about half the size of the codfw index (6.7g vs 14.5g). Auto complete is fairly high traffic, we should probably shift the autocomplete traffic to codfw until it can be fixed which probably requires a rebuild and a couple hours.

Thu, Apr 25, 7:25 PM · CirrusSearch, Discovery-Search (Current work), Patch-For-Review

Wed, Apr 24

EBernhardson created P61184 (An Untitled Masterwork).
Wed, Apr 24, 9:31 PM

Fri, Apr 19

EBernhardson added a comment to T358345: [Epic] Search metrics 2024.

For those following along, have a look at the comment in T358349#9727873 to identify the notebook helping to fill a table in @EBernhardson's namespace and an example Superset.

Erik, nice work so far!

I'm interested to see migration of the the coarse grained session ratios in the subtasks, which are expressed in the previous notebooks such as T358352-user-sessions-using-search.ipynb brought into the Superset dashboard (the Python-deduced number of actors, as well as unique_devices_per_domain_daily divisors are helpful in particular for the AC).

Fri, Apr 19, 11:25 PM · Discovery-Search (Current work), Epic

Thu, Apr 18

EBernhardson added a comment to T358349: Search Metrics - Number of Searches.

This chart should (eventually) contain the same data as gehel posted above. As of this moment only 5 days are calculated but the aggregate % have already settled in. I only spent a couple minutes to make the chart, this probably isn't the best way to present the data. But an example: https://superset.wikimedia.org/explore/?slice_id=3368

Thu, Apr 18, 8:21 PM · Discovery-Search (Current work)
EBernhardson added a comment to T358349: Search Metrics - Number of Searches.

@EBernhardson should we close this as a duplicate and move "(full text search, go bar, ...)" as a dimension aspect in T358352: Search Metrics - Number of user sessions using search?

Thu, Apr 18, 6:43 PM · Discovery-Search (Current work)

Wed, Apr 17

EBernhardson added a comment to T358599: Integrate Saneitizer with SUP.

One potential improvement we talked about, the initial method of configuring the saneitizer adds new pieces to the flink execution graph. This means you have to play around with some dangerous options to pause saneitization, losing the current saneitization state in the process. We should update the operation of the flag to enable saneitization so that it still connects to the graph, but never emits any events or state changes. The general idea is that the shape of the graph should not change due to configuration changes, as graph shape changes require careful deployments.

Wed, Apr 17, 7:49 PM · MW-1.42-notes (1.42.0-wmf.26; 2024-04-09), Discovery-Search (Current work)
EBernhardson added a comment to T358599: Integrate Saneitizer with SUP.

Iniital deployment has been a bit rocky, in particular saneitizer is visiting pages with error states we haven't seen in normal operation yet. The pipeline has been running for a couple hours now without issues,. If it's still running without restarts by tomorrow we can probably consider the initial deployment complete.

Wed, Apr 17, 7:38 PM · MW-1.42-notes (1.42.0-wmf.26; 2024-04-09), Discovery-Search (Current work)
EBernhardson moved T358518: Deploy streaming updater for 100% of writes to cloudelastic from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Wed, Apr 17, 7:36 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Discovery-Search (Current work)

Tue, Apr 16

EBernhardson added a comment to T362367: [wmf.26 - eswiki] Homepage: task counter issues - "No suggestions found" incorrectly displayed .

This looks to be all caught back up from our side

Tue, Apr 16, 3:24 PM · Growth-Team (Sprint 12 (Growth Team)), Discovery-Search (Current work), CirrusSearch, Regression, GrowthExperiments
EBernhardson edited P60544 (An Untitled Masterwork).
Tue, Apr 16, 12:23 AM
EBernhardson created P60544 (An Untitled Masterwork).
Tue, Apr 16, 12:20 AM

Mon, Apr 15

EBernhardson added a comment to T342444: Reindex all wikis to enable apostrophe normalization, camelCase handling, acronym handling, word_break_helper, and icu_tokenizer/_repair.

All indices on cloudelastic look to be recreated now as well. It hasn't been running this whole time, it just took me awhile to get around to verifying the operation and finishing the couple wikis that failed the first two times through.

Mon, Apr 15, 5:59 PM · Discovery-Search (Current work)
EBernhardson moved T342444: Reindex all wikis to enable apostrophe normalization, camelCase handling, acronym handling, word_break_helper, and icu_tokenizer/_repair from In Progress to Needs Reporting on the Discovery-Search (Current work) board.
Mon, Apr 15, 5:59 PM · Discovery-Search (Current work)
EBernhardson moved T359580: CirrusSearch should not send outdated cirrussearch-request events from Blocked/Waiting to Needs review on the Discovery-Search (Current work) board.
Mon, Apr 15, 3:15 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T362367: [wmf.26 - eswiki] Homepage: task counter issues - "No suggestions found" incorrectly displayed .

it was backfilling over the weekend but got stuck around feb 6th. It's back to processing hourlies, i expect they will keep decreasing for at least 12 more hours of processing based on the current rates, as long as it doesn't get stuck again.

Mon, Apr 15, 2:07 PM · Growth-Team (Sprint 12 (Growth Team)), Discovery-Search (Current work), CirrusSearch, Regression, GrowthExperiments

Fri, Apr 12

EBernhardson added a comment to T362367: [wmf.26 - eswiki] Homepage: task counter issues - "No suggestions found" incorrectly displayed .

they are stored and processing through now at a rate of something like one hour per minute. It should catchup soon enough.

Fri, Apr 12, 11:18 PM · Growth-Team (Sprint 12 (Growth Team)), Discovery-Search (Current work), CirrusSearch, Regression, GrowthExperiments
EBernhardson added a comment to T362367: [wmf.26 - eswiki] Homepage: task counter issues - "No suggestions found" incorrectly displayed .

Hmm, indeed it looks like hourly transfers have been stuck for quite some time. Somehow airflow thinks there are two hours running and it never failed them. It is still waiting for them to complet even though nothing is running. It looks like we never set an SLA value on this dag, so it's failures probably don't get properly recognized. I've reset the two two tasks that were stuck and will see how i can get these all moving again, along with adding an sla so it properly alerts.

Fri, Apr 12, 9:02 PM · Growth-Team (Sprint 12 (Growth Team)), Discovery-Search (Current work), CirrusSearch, Regression, GrowthExperiments
EBernhardson created P60469 (An Untitled Masterwork).
Fri, Apr 12, 4:57 PM

Thu, Apr 11

EBernhardson added a comment to T358352: Search Metrics - Number of user sessions using search.

Adam suggesting taking an easier way out and using the actor_signature definition of a unique device. This hashes together a couple values in the web request to create a fingerprint. The absolute number won't really be comparable to the overall unique devices metric, but we can calculate a % of actor_signatures and assume that it's in the same ballpark.

Thu, Apr 11, 5:17 PM · Discovery-Search (Current work)
EBernhardson added a comment to T358351: Search Metrics - Read traffic generated by Search.

Worked through most of this and can compute single day stats that seem plausible with a notebook. Will come back to it once the other metrics are figured out and extend this to calculate 90 days of dailies and offer monthly and ~quartly numbers over those daily stats. To follow up on the above:

Thu, Apr 11, 5:09 PM · MW-1.43-notes (1.43.0-wmf.1; 2024-04-16), Discovery-Search (Current work)

Tue, Apr 9

EBernhardson added a comment to T358351: Search Metrics - Read traffic generated by Search.

Started to look into this a bit closer. We will probably need to do custom work for each endpoint we want to classify. To start with:

Tue, Apr 9, 5:49 PM · MW-1.43-notes (1.43.0-wmf.1; 2024-04-16), Discovery-Search (Current work)
EBernhardson added a comment to T358345: [Epic] Search metrics 2024.

In terms of the requested dimensons:

Tue, Apr 9, 5:23 PM · Discovery-Search (Current work), Epic

Mon, Apr 8

EBernhardson added a comment to T358351: Search Metrics - Read traffic generated by Search.

If we want this to be directly comparable to page views then i imagine this should be implemented as a classifier against the web requests table. We would miss a few narrow cases with cross-domain search results (sister-search) but I suspect the referrer attached to the page views is sufficient to classify page views as from-search or not.

Mon, Apr 8, 9:37 PM · MW-1.43-notes (1.43.0-wmf.1; 2024-04-16), Discovery-Search (Current work)
EBernhardson added a comment to T358352: Search Metrics - Number of user sessions using search.

If we want a very simple count we currently record a weak fingerprint of the browser which is basically a hash of the ip address and the username. Due to the way this data is collected it does not include cached results, primarily that is short autocompletes and the related articles. This can be counted over whatever time dimension we want. The downside of this is that it's not directly comparable to anything. It is an absolute number and the directionality would be meaningful, but as a standalone datapoint it would be hard to say these sessions represent x% of all unique devices.

Mon, Apr 8, 9:33 PM · Discovery-Search (Current work)
EBernhardson moved T359580: CirrusSearch should not send outdated cirrussearch-request events from Incoming to Blocked/Waiting on the Discovery-Search (Current work) board.

Moving to waiting, as we need to wait and see if changing the log buffering fixed the issue or not.

Mon, Apr 8, 9:15 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson claimed T359580: CirrusSearch should not send outdated cirrussearch-request events.
Mon, Apr 8, 8:05 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T359580: CirrusSearch should not send outdated cirrussearch-request events.

I took a look over the actual event generation, but I can't see why meta.dt would be outdated. Our request logging does cache some things, but the meta information isn't one of them. We fetch the value for meta.dt from wfTimestamp() (global clock) and immediately provide the event to logging. Logging does put the request into a second deferred update, but as long as we are running from inside a deferred update the system guarantees it will run any deferred submitted while running the deferred immediately after (via a scope-stack abstraction).

Mon, Apr 8, 8:04 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson updated subscribers of T357353: Application Security Review Request : NetworkSession MediaWiki extension .

Certainly we can meet up. This is a pretty narrow extension i think we can get by with only me from our side. Feel free to schedule when you have availability. @pfischer or @dcausse could perhaps be optional attendies.

Mon, Apr 8, 7:03 PM · Discovery-Search (Current work), secscrum, Security, Application Security Reviews
EBernhardson claimed T357066: CirrusSearch\BuildDocument\BuildDocumentException: ParserOutput cannot be obtained..
Mon, Apr 8, 6:52 PM · MW-1.43-notes (1.43.0-wmf.2; 2024-04-23), Discovery-Search (Current work), User-brennen, CirrusSearch, Wikimedia-production-error

Wed, Apr 3

EBernhardson moved T358599: Integrate Saneitizer with SUP from In Progress to To Be Deployed on the Discovery-Search (Current work) board.
Wed, Apr 3, 6:24 PM · MW-1.42-notes (1.42.0-wmf.26; 2024-04-09), Discovery-Search (Current work)
EBernhardson moved T356933: Streaming Updater should still make forward progress when one index has problems from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Wed, Apr 3, 6:22 PM · Discovery-Search (Current work)

Tue, Apr 2

EBernhardson created P59237 yaml-aware diff of bad patch.
Tue, Apr 2, 10:08 PM

Mar 25 2024

EBernhardson moved T358413: Byte size not plurialized in search results with interface in French from Incoming to Blocked/Waiting on the Discovery-Search (Current work) board.

The interface message as provided above is search-result-size. The english version, provided by dev, is as follows:

Mar 25 2024, 5:33 PM · I18n, MediaWiki-Internationalization, CirrusSearch

Mar 20 2024

EBernhardson created T360536: Increase retention of training data.
Mar 20 2024, 3:38 PM · Discovery-Search

Mar 5 2024

EBernhardson added a comment to T359136: Global-search is showing duplicate results.

We are in the process of deploying a new updater for CirrusSearch, with cloudelastic as the first destination cluster. Duplicates could be a result of that, and are good to report so we can get everything working great before moving on to the primary search clusters.

Mar 5 2024, 5:47 PM · Data-Platform-SRE (2024.03.04 - 2024.03.24), Discovery-Search, Internet-Archive, Tool-global-search

Feb 29 2024

EBernhardson updated subscribers of T358541: 400 - Bad Request on any Global Search.

@bking this is likely related to the transition of cloudelastic to private ips? I'll take a look later if you don't have ideas.

Feb 29 2024, 3:13 PM · Discovery-Search, Data-Platform-SRE, Tool-global-search

Feb 28 2024

EBernhardson moved T358518: Deploy streaming updater for 100% of writes to cloudelastic from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Feb 28 2024, 6:15 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Discovery-Search (Current work)

Feb 27 2024

EBernhardson updated the task description for T358599: Integrate Saneitizer with SUP.
Feb 27 2024, 5:05 PM · MW-1.42-notes (1.42.0-wmf.26; 2024-04-09), Discovery-Search (Current work)
EBernhardson added a subtask for T317045: [Epic] Re-architect the Search Update Pipeline: T358599: Integrate Saneitizer with SUP.
Feb 27 2024, 4:41 PM · Discovery-Search (Current work), Epic
EBernhardson added a parent task for T358599: Integrate Saneitizer with SUP: T317045: [Epic] Re-architect the Search Update Pipeline.
Feb 27 2024, 4:41 PM · MW-1.42-notes (1.42.0-wmf.26; 2024-04-09), Discovery-Search (Current work)
EBernhardson added a project to T358599: Integrate Saneitizer with SUP: Discovery-Search (Current work).
Feb 27 2024, 4:41 PM · MW-1.42-notes (1.42.0-wmf.26; 2024-04-09), Discovery-Search (Current work)
EBernhardson updated the task description for T358599: Integrate Saneitizer with SUP.
Feb 27 2024, 4:36 PM · MW-1.42-notes (1.42.0-wmf.26; 2024-04-09), Discovery-Search (Current work)
EBernhardson created T358599: Integrate Saneitizer with SUP.
Feb 27 2024, 4:35 PM · MW-1.42-notes (1.42.0-wmf.26; 2024-04-09), Discovery-Search (Current work)

Feb 26 2024

EBernhardson moved T358518: Deploy streaming updater for 100% of writes to cloudelastic from Incoming to Needs review on the Discovery-Search (Current work) board.
Feb 26 2024, 7:02 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Discovery-Search (Current work)
EBernhardson created T358518: Deploy streaming updater for 100% of writes to cloudelastic.
Feb 26 2024, 7:00 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Discovery-Search (Current work)
EBernhardson added a comment to T358061: Global Search is down: 500: Internal Server Error / Could not resolve host: cloudelastic1004.wikimedia.org.

I suspect at the time we initially setup global-search we didn't have the cloudelastic.wikimedia.org alias up and running yet, but now that that exists should certainly point at it instead of individual servers.

Feb 26 2024, 4:20 PM · Tool-global-search

Feb 23 2024

EBernhardson added a comment to T356303: Review wikitech:Search and write processes for k8s world.

Yesterday on IRC the question was raised:

this is probably the wrong way around, but i have a python script that uses helmfile apply --set ... to deploy a special backfilling release that is not part of the normal release process. This release runs to completion, but the related custom operator (flink) only understands things that run forever, so my python script also does a helm destroy to clean up afterwards.
I guess my question is, is there a reasonable way to ensure i'm deleting the thing i think i'm deleting? I was considering perhaps adjusting the chart so i can provide a backfill_id label with --set and then use that id in a selector when destroy'ing

From what I understood (and please correct my if I'm wrong! :)) the process is as follows:

  • You deploy a separate helmfile release "...-backfill" that creates a separate FlinkDeployment which launches a job that runs to completion (may take a long time, though)
  • The jobmanager Pod than keeps lingering around (blocking resources, 500m CPU, 100Mi Memory) because the flink-operator configures SHUTDOWN_ON_APPLICATION_FINISH=false in any case to for internal reasons
  • You destroy the helmfile release to clean up the jobmanager (by removing the FlinkDeployment object)

One question that comes to mind immediately, and I might be completely off here: Isn't this what a Flink session cluster is for? Having just one Jobmanager that controls multiple Jobs (e.g. the generic one plus backfill) that can be submitted at runtime?

Feb 23 2024, 10:31 PM · Data-Platform-SRE (2024.03.25 - 2024.04.14), Documentation, Discovery-Search (Current work)

Feb 22 2024

EBernhardson added a comment to T356303: Review wikitech:Search and write processes for k8s world.

To review the documentation changes (there are also two revisions from bking mixed in there): https://wikitech.wikimedia.org/w/index.php?title=Search&diff=2153071&oldid=2127290

Feb 22 2024, 11:31 PM · Data-Platform-SRE (2024.03.25 - 2024.04.14), Documentation, Discovery-Search (Current work)
EBernhardson added a comment to T356303: Review wikitech:Search and write processes for k8s world.

Example query of the rest api (could be nicer if we installed curl or wget, or exposed the rest api directly):

KUBECONFIG=/etc/kubernetes/cirrus-streaming-updater-deploy-staging.config kubectl \>
   exec \
   flink-app-consumer-search-backfill-5b9f979487-dsqsb \
   -c flink-main-container \
   -- \
   python3 -c 'import urllib.request; print(urllib.request.urlopen("http://localhost:8081/v1/jobs").read().decode("utf8"))'
Feb 22 2024, 5:59 PM · Data-Platform-SRE (2024.03.25 - 2024.04.14), Documentation, Discovery-Search (Current work)
EBernhardson added a comment to T356303: Review wikitech:Search and write processes for k8s world.

On further review, simply documenting the various commands to run seemed error prone. Attached patch adds a python script that simplifies away most of the reindexing and backfill to ease future burden.

Feb 22 2024, 12:25 AM · Data-Platform-SRE (2024.03.25 - 2024.04.14), Documentation, Discovery-Search (Current work)

Feb 15 2024

EBernhardson added a comment to T356526: High level of backend errors for CirrusSearch jobs in jobrunners.

Was supposed to be in the backport window today, but train problems blocked that. This is a pretty safe patch though, i'll ship it a little later.

Feb 15 2024, 10:36 PM · MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), serviceops, Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T356526: High level of backend errors for CirrusSearch jobs in jobrunners.

It seems the patch didn't actually make it into wmf.18 as expected, jenkins-bot never finished the merge so this was only deployed in wmf.17. I'll get it shipped there too.

Feb 15 2024, 7:01 PM · MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), serviceops, Discovery-Search (Current work), CirrusSearch

Feb 14 2024

EBernhardson added a comment to T356303: Review wikitech:Search and write processes for k8s world.

I've been reviewing our options for backfilling and trying to come up with a plan, i think the following will work:

Feb 14 2024, 10:34 PM · Data-Platform-SRE (2024.03.25 - 2024.04.14), Documentation, Discovery-Search (Current work)
EBernhardson moved T356655: Create tool and process to investigate Search update Pipeline failures from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Feb 14 2024, 6:45 PM · Discovery-Search (Current work)
EBernhardson moved T356526: High level of backend errors for CirrusSearch jobs in jobrunners from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.

This looks resolved now, the bi-hourly spikes have gone away since the monday deployment.

Feb 14 2024, 5:31 PM · MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), serviceops, Discovery-Search (Current work), CirrusSearch

Feb 12 2024

EBernhardson moved T357353: Application Security Review Request : NetworkSession MediaWiki extension from Incoming to Blocked/Waiting on the Discovery-Search (Current work) board.
Feb 12 2024, 10:02 PM · Discovery-Search (Current work), secscrum, Security, Application Security Reviews
EBernhardson added a project to T357353: Application Security Review Request : NetworkSession MediaWiki extension : Discovery-Search (Current work).
Feb 12 2024, 10:02 PM · Discovery-Search (Current work), secscrum, Security, Application Security Reviews
EBernhardson added a subtask for T355267: Add extension NetworkSession to all wmf wikis: T357353: Application Security Review Request : NetworkSession MediaWiki extension .
Feb 12 2024, 9:55 PM · Discovery-Search (Current work), Wikimedia-extension-review-queue, Wikimedia-Extension-setup
EBernhardson added a parent task for T357353: Application Security Review Request : NetworkSession MediaWiki extension : T355267: Add extension NetworkSession to all wmf wikis.
Feb 12 2024, 9:55 PM · Discovery-Search (Current work), secscrum, Security, Application Security Reviews
EBernhardson created T357353: Application Security Review Request : NetworkSession MediaWiki extension .
Feb 12 2024, 9:54 PM · Discovery-Search (Current work), secscrum, Security, Application Security Reviews
EBernhardson created P56688 (An Untitled Masterwork).
Feb 12 2024, 9:33 PM
EBernhardson moved T354976: Create new NetworkSession mediawiki extension from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.

Updated the mw.org page with the latest changes, so it's now inline with the repository. I think this is enough to call this ticket complete. T355267 is the task for deploying this extension to the wikis.

Feb 12 2024, 9:20 PM · MW-Interfaces-Team, serviceops-radar, NetworkSession, Patch-For-Review, API Platform, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch
EBernhardson added a subtask for T345185: Provide a method for internal services to run api requests for private wikis: T355267: Add extension NetworkSession to all wmf wikis.
Feb 12 2024, 9:19 PM · MW-1.42-notes (1.42.0-wmf.16; 2024-01-30), API Platform, serviceops, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch
EBernhardson added a parent task for T355267: Add extension NetworkSession to all wmf wikis: T345185: Provide a method for internal services to run api requests for private wikis.
Feb 12 2024, 9:19 PM · Discovery-Search (Current work), Wikimedia-extension-review-queue, Wikimedia-Extension-setup
EBernhardson added a comment to T356651: Rebuild and deploy textify plugin.

Released the plugin as -wmf12. Patch above updates the .deb to use the newest versions. MR also up on gitlab to update the dev image (for cindy/dev envs) to use the new .deb once available.

Feb 12 2024, 8:37 PM · Data-Platform-SRE (2024.02.12 - 2024.03.03), Discovery-Search (Current work)
EBernhardson claimed T356439: [Tracking] Evaluate differences in saneitizer fixes eqiad vs cloudelastic.
Feb 12 2024, 4:16 PM · Discovery-Search (Current work)

Feb 9 2024

EBernhardson added a comment to T356526: High level of backend errors for CirrusSearch jobs in jobrunners.

If we need them silenced, best bet is probably to re-enable the writes for these wikis. Can be done with a mediawiki-config patch.

Feb 9 2024, 5:49 PM · MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), serviceops, Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T356526: High level of backend errors for CirrusSearch jobs in jobrunners.

I haven't managed to track down where the Received cirrusSearchElasticaWrite job for unwritable cluster cloudelastic error comes from. We recently turned off writes to this cluster from mediawiki on select wikis, but somewhere in the codebase is still trying to create writes even though it shouldn't. Needs more invetigation on our side.

Feb 9 2024, 4:05 PM · MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), serviceops, Discovery-Search (Current work), CirrusSearch

Feb 8 2024

EBernhardson claimed T356655: Create tool and process to investigate Search update Pipeline failures.
Feb 8 2024, 11:07 PM · Discovery-Search (Current work)

Feb 7 2024

EBernhardson created P56486 (An Untitled Masterwork).
Feb 7 2024, 10:43 PM
EBernhardson updated the task description for T356933: Streaming Updater should still make forward progress when one index has problems.
Feb 7 2024, 10:29 PM · Discovery-Search (Current work)
EBernhardson created T356933: Streaming Updater should still make forward progress when one index has problems.
Feb 7 2024, 10:26 PM · Discovery-Search (Current work)
EBernhardson moved T354976: Create new NetworkSession mediawiki extension from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Feb 7 2024, 4:57 PM · MW-Interfaces-Team, serviceops-radar, NetworkSession, Patch-For-Review, API Platform, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch

Feb 5 2024

EBernhardson added a comment to T356655: Create tool and process to investigate Search update Pipeline failures.

Current process (to be refined). None of this is committed anywhere yet, mostly working out what is going to work.

Feb 5 2024, 10:12 PM · Discovery-Search (Current work)
EBernhardson added a comment to T356439: [Tracking] Evaluate differences in saneitizer fixes eqiad vs cloudelastic.
Feb 5 2024, 10:11 PM · Discovery-Search (Current work)
EBernhardson added a comment to T356655: Create tool and process to investigate Search update Pipeline failures.

Idea is something like:

Feb 5 2024, 5:12 PM · Discovery-Search (Current work)
EBernhardson moved T356526: High level of backend errors for CirrusSearch jobs in jobrunners from Incoming to To Be Deployed on the Discovery-Search (Current work) board.

This is a bit of a non error. What happened is:

Feb 5 2024, 4:35 PM · MW-1.42-notes (1.42.0-wmf.18; 2024-02-13), serviceops, Discovery-Search (Current work), CirrusSearch
EBernhardson removed a project from T356302: setup production Cirrus Streaming Updater alerts : Epic.
Feb 5 2024, 4:12 PM · Discovery-Search (Current work)

Feb 1 2024

EBernhardson added a comment to T356439: [Tracking] Evaluate differences in saneitizer fixes eqiad vs cloudelastic.

Started with the ghost page in index errors, since there are only a couple. We have two pages in cloudelastic for frwiki that have been correctly deleted in eqiad but still exist in cloudelastic:

Feb 1 2024, 11:44 PM · Discovery-Search (Current work)
EBernhardson added a project to T356439: [Tracking] Evaluate differences in saneitizer fixes eqiad vs cloudelastic: Discovery-Search (Current work).
Feb 1 2024, 8:18 PM · Discovery-Search (Current work)
EBernhardson created T356439: [Tracking] Evaluate differences in saneitizer fixes eqiad vs cloudelastic.
Feb 1 2024, 8:18 PM · Discovery-Search (Current work)
EBernhardson updated the task description for T356438: Add NetworkSession mediawiki extension to translatewiki.net.
Feb 1 2024, 8:08 PM · translatewiki.net
EBernhardson added a comment to T354976: Create new NetworkSession mediawiki extension.

Localization - The only localization is the extension description, unclear if necessary (or how).

Feb 1 2024, 8:08 PM · MW-Interfaces-Team, serviceops-radar, NetworkSession, Patch-For-Review, API Platform, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch
EBernhardson added a subtask for T354976: Create new NetworkSession mediawiki extension: T356438: Add NetworkSession mediawiki extension to translatewiki.net.
Feb 1 2024, 8:07 PM · MW-Interfaces-Team, serviceops-radar, NetworkSession, Patch-For-Review, API Platform, Discovery-Search (Current work), MediaWiki-Configuration, CirrusSearch