Page MenuHomePhabricator

EBernhardson (EBernhardson)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Oct 7 2014, 4:49 PM (316 w, 5 h)
Availability
Available
LDAP User
EBernhardson
MediaWiki User
EBernhardson (WMF) [ Global Accounts ]

Recent Activity

Yesterday

EBernhardson added a comment to T266495: Create Debian Package for Flink.

I don't know if it's relevant at all, but anlytics is in the process of switching to apache bigtop (from cloudera hadoop). That includes flink debs, would we want to use that?

Mon, Oct 26, 6:47 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
EBernhardson added a comment to T239931: Reduce the impact of the sanitizer on wikidata.

Saneitizer was turned back on last week, everything there is working well and wikidata can be reenabled any time.

Mon, Oct 26, 6:14 PM · Wikibase wb_terms leftovers 2020, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Patch-For-Review, Discovery-Search (Current work), CirrusSearch, Wikidata

Wed, Oct 21

EBernhardson moved T263073: Large, steady increase in unprocessed cloudelastic job.cirrusSearchElasticaWrite messages from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Wed, Oct 21, 10:30 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T263073: Large, steady increase in unprocessed cloudelastic job.cirrusSearchElasticaWrite messages.

Saneitizer has been running, not seeing any backlog of jobs building up this time around. Everything seems to be in order now.

Wed, Oct 21, 10:29 PM · Patch-For-Review, Discovery-Search (Current work)

Tue, Oct 20

EBernhardson committed rWDAN629e8bc4c6c7: search satisfaction: remove now unused y/m/d cli args (authored by EBernhardson).
search satisfaction: remove now unused y/m/d cli args
Tue, Oct 20, 3:53 PM

Mon, Oct 19

EBernhardson committed rWDAN4bfd6c9d8ecc: hive/spark: Column names are case insensitive (authored by EBernhardson).
hive/spark: Column names are case insensitive
Mon, Oct 19, 11:09 PM
EBernhardson moved T261239: Reboot (restart) Elasticsearch nodes from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mon, Oct 19, 9:56 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson committed rWDAN94c23a17b52d: Also fix column mismatch writing articletopic predictions (authored by EBernhardson).
Also fix column mismatch writing articletopic predictions
Mon, Oct 19, 9:16 PM
EBernhardson committed rWDANe66bec217945: Fix same column mismatch loading wikibase_item (authored by EBernhardson).
Fix same column mismatch loading wikibase_item
Mon, Oct 19, 8:15 PM
EBernhardson added a comment to T238151: Tune Glent Method 1 algorithm.

It looks like we are currently running glent 0.2.3 which includes the patches referenced above. Checked the attached patches and it looks like everything is shipped. Should we close this and move on to figuring out how we want to put it in front of users?

Mon, Oct 19, 6:48 PM · Discovery-Search (Current work)
EBernhardson created P13023 (An Untitled Masterwork).
Mon, Oct 19, 6:11 PM
EBernhardson moved T265914: Investigate Resource Needs for Commons and Wikidata Elasticsearch indices from Incoming to Ready for Development on the Discovery-Search (Current work) board.
Mon, Oct 19, 5:38 PM · Discovery-Search (Current work)
EBernhardson set the point value for T265914: Investigate Resource Needs for Commons and Wikidata Elasticsearch indices to 8.
Mon, Oct 19, 5:37 PM · Discovery-Search (Current work)
EBernhardson moved T265699: 40-elasticsearch-readahead udev rule failing for cloudelastic100[5,6] from Incoming to Ready for Development on the Discovery-Search (Current work) board.
Mon, Oct 19, 5:30 PM · Discovery-Search (Current work)
EBernhardson set the point value for T265699: 40-elasticsearch-readahead udev rule failing for cloudelastic100[5,6] to 5.
Mon, Oct 19, 5:28 PM · Discovery-Search (Current work)
EBernhardson moved T265547: Replace mjolnir venv deployment scheme in analytics from Incoming to Ready for Development on the Discovery-Search (Current work) board.
Mon, Oct 19, 5:26 PM · Discovery-Search (Current work)
EBernhardson set the point value for T265547: Replace mjolnir venv deployment scheme in analytics to 8.
Mon, Oct 19, 5:24 PM · Discovery-Search (Current work)
EBernhardson moved T265452: Add a configurable restart strategy to the streaming updater from Incoming to Ready for Development on the Discovery-Search (Current work) board.
Mon, Oct 19, 5:23 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
EBernhardson set the point value for T265452: Add a configurable restart strategy to the streaming updater to 3.
Mon, Oct 19, 5:23 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
EBernhardson moved T265081: Fix Chinese Analysis Chain for Glent M2 from Incoming to Ready for Development on the Discovery-Search (Current work) board.
Mon, Oct 19, 5:20 PM · Discovery-Search (Current work), Chinese-Sites
EBernhardson set the point value for T265081: Fix Chinese Analysis Chain for Glent M2 to 3.
Mon, Oct 19, 5:19 PM · Discovery-Search (Current work), Chinese-Sites
EBernhardson edited projects for T265056: Cirrus Search dumps failed for some wikis, added: Discovery-Search; removed Discovery-Search (Current work).
Mon, Oct 19, 5:15 PM · Discovery-Search, CirrusSearch, Dumps-Generation
EBernhardson set the point value for T264877: SonarQube should analyze all Search Platform projects to 8.
Mon, Oct 19, 5:12 PM · Discovery-Search (Current work), User-zeljkofilipin, Code-Health
EBernhardson moved T264877: SonarQube should analyze all Search Platform projects from Incoming to Ready for Development on the Discovery-Search (Current work) board.
Mon, Oct 19, 5:08 PM · Discovery-Search (Current work), User-zeljkofilipin, Code-Health
EBernhardson moved T264873: Ensure that SonarQube is commenting on gerrit code reviews of the Search Platform team from Incoming to Ready for Development on the Discovery-Search (Current work) board.
Mon, Oct 19, 5:08 PM · Discovery-Search (Current work), Code-Health
EBernhardson set the point value for T264873: Ensure that SonarQube is commenting on gerrit code reviews of the Search Platform team to 3.
Mon, Oct 19, 5:07 PM · Discovery-Search (Current work), Code-Health
EBernhardson added a comment to T264873: Ensure that SonarQube is commenting on gerrit code reviews of the Search Platform team.

Does/Should this ticket include sonarqube for our python and scala projects?

Mon, Oct 19, 5:07 PM · Discovery-Search (Current work), Code-Health
EBernhardson moved T264659: Update BAG & BRT SPARQL endpoint in the whitelist from Incoming to Ready for Development on the Discovery-Search (Current work) board.
Mon, Oct 19, 5:05 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata, Wikimedia-Site-requests
EBernhardson set the point value for T264659: Update BAG & BRT SPARQL endpoint in the whitelist to 1.
Mon, Oct 19, 5:04 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata, Wikimedia-Site-requests
EBernhardson updated the task description for T265641: Build integration test suite for search platform airflow + hadoop + spark integration.
Mon, Oct 19, 4:08 PM · Discovery-Search
EBernhardson triaged T265452: Add a configurable restart strategy to the streaming updater as High priority.
Mon, Oct 19, 3:41 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
EBernhardson moved T265452: Add a configurable restart strategy to the streaming updater from All WDQS-related tasks to Current work on the Wikidata-Query-Service board.
Mon, Oct 19, 3:41 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
EBernhardson moved T265547: Replace mjolnir venv deployment scheme in analytics from needs triage to Current work on the Discovery-Search board.
Mon, Oct 19, 3:31 PM · Discovery-Search (Current work)
EBernhardson triaged T265547: Replace mjolnir venv deployment scheme in analytics as High priority.
Mon, Oct 19, 3:30 PM · Discovery-Search (Current work)
EBernhardson raised the priority of T265641: Build integration test suite for search platform airflow + hadoop + spark integration from Medium to High.
Mon, Oct 19, 3:22 PM · Discovery-Search
EBernhardson triaged T265641: Build integration test suite for search platform airflow + hadoop + spark integration as Medium priority.
Mon, Oct 19, 3:21 PM · Discovery-Search
EBernhardson moved T265641: Build integration test suite for search platform airflow + hadoop + spark integration from needs triage to ML & Data Pipeline on the Discovery-Search board.
Mon, Oct 19, 3:21 PM · Discovery-Search
EBernhardson moved T265699: 40-elasticsearch-readahead udev rule failing for cloudelastic100[5,6] from needs triage to Current work on the Discovery-Search board.
Mon, Oct 19, 3:20 PM · Discovery-Search (Current work)
EBernhardson assigned T265908: "ElasticSearch shard size check" icinga warnings on cloudelastic servers to RKemper.
Mon, Oct 19, 3:19 PM · Discovery-Search (Current work), cloud-services-team (Kanban)
EBernhardson moved T265908: "ElasticSearch shard size check" icinga warnings on cloudelastic servers from needs triage to Current work on the Discovery-Search board.
Mon, Oct 19, 3:18 PM · Discovery-Search (Current work), cloud-services-team (Kanban)

Fri, Oct 16

EBernhardson committed rWDAN5731d0533969: convert_to_esbulk: Implement multilist handler from super_detect_noop (authored by EBernhardson).
convert_to_esbulk: Implement multilist handler from super_detect_noop
Fri, Oct 16, 5:37 PM
EBernhardson committed rWDAN27d0b01a620f: cirrus namespace map: Align output columns with table (authored by EBernhardson).
cirrus namespace map: Align output columns with table
Fri, Oct 16, 7:05 AM

Thu, Oct 15

EBernhardson added a comment to T259979: Redirect wikimedia.org/research to research.wikimedia.org instead of some external closed survey.

I think this would typically go in https://wikitech.wikimedia.org/wiki/Puppet_request_window. These should be every Tues/Thurs at 16:00 UTC and can be found in the deployments calendar.

Thu, Oct 15, 11:02 PM · Research, Wikimedia-Apache-configuration, Patch-For-Review, Operations
EBernhardson updated the task description for T265641: Build integration test suite for search platform airflow + hadoop + spark integration.
Thu, Oct 15, 9:52 PM · Discovery-Search
EBernhardson added a comment to T265641: Build integration test suite for search platform airflow + hadoop + spark integration.

One option would be faking a prod setup with some docker images. Cloudera (our hadoop distribution) used to (may still) provide a docker image that stand up hdfs + hive + related stuff . We ought to be able to setup a second image for integration testing that can talk to this.

Thu, Oct 15, 9:51 PM · Discovery-Search
EBernhardson updated the task description for T265641: Build integration test suite for search platform airflow + hadoop + spark integration.
Thu, Oct 15, 9:49 PM · Discovery-Search
EBernhardson committed rWDAN88e12838979a: wmf_spark: Fix failure to generate partition cond when empty (authored by EBernhardson).
wmf_spark: Fix failure to generate partition cond when empty
Thu, Oct 15, 7:37 PM
EBernhardson updated the task description for T265641: Build integration test suite for search platform airflow + hadoop + spark integration.
Thu, Oct 15, 5:46 PM · Discovery-Search
EBernhardson created T265641: Build integration test suite for search platform airflow + hadoop + spark integration.
Thu, Oct 15, 5:46 PM · Discovery-Search
EBernhardson committed rWDAN500bdad551fe: spark: correctly parse non-partitioned partition spec (authored by EBernhardson).
spark: correctly parse non-partitioned partition spec
Thu, Oct 15, 3:06 PM

Wed, Oct 14

EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

Reviewed deployed mitigations:

Wed, Oct 14, 11:25 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson committed rWDAN04548dd74779: Centralize reading/writing to hive (authored by EBernhardson).
Centralize reading/writing to hive
Wed, Oct 14, 10:36 PM
EBernhardson created T265547: Replace mjolnir venv deployment scheme in analytics.
Wed, Oct 14, 9:23 PM · Discovery-Search (Current work)
EBernhardson claimed T237364: Write Glent M0 A/B test report.
Wed, Oct 14, 6:51 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson moved T237364: Write Glent M0 A/B test report from Ready for Development to Needs review on the Discovery-Search (Current work) board.
Wed, Oct 14, 6:51 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T237364: Write Glent M0 A/B test report.

Between March 13, 2020 and May 15, 2020 (63 days, 9 weeks) 50% of Special:Search traffic to enwiki, dewiki and frwiki were augmented with glent session-similarity (Method 0) query suggestions. Over this time glent had the opportunity to provide suggestions to 67 million search requests. Across all metrics measured the inclusion of session-similarity based query suggestions improves by small but measurably significant amounts. The rest of the report will look more specifically into conversion rates of the various steps in the user flow between issuing a query and satisfying their information need.

Wed, Oct 14, 6:36 PM · Discovery-Search (Current work), CirrusSearch

Tue, Oct 13

EBernhardson added a comment to T239931: Reduce the impact of the sanitizer on wikidata.

All seems reasonable to me. Note that the saneitizer is globally turned off as of a week ago due to a separate incident. Expecting to turn that back on this week.

Tue, Oct 13, 5:52 PM · Wikibase wb_terms leftovers 2020, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Patch-For-Review, Discovery-Search (Current work), CirrusSearch, Wikidata

Mon, Oct 12

EBernhardson committed rWDAN77febb6fcda2: Parameterize active mediawiki datacenter (authored by EBernhardson).
Parameterize active mediawiki datacenter
Mon, Oct 12, 6:55 AM

Fri, Oct 9

EBernhardson added a comment to T265164: Evaluate usage of MediaWiki-Vagrant by technical contributors.

I use vagrant for all mediawiki and extension development. This is primarily because other environments i've tried do not setup families of language wikis along with commons which I need for CirrusSearch development.

Fri, Oct 9, 8:54 PM · User-bd808, MediaWiki-Vagrant

Thu, Oct 8

EBernhardson committed rWDANa9239495cce7: search_satisfaction: align druid datasource with reality (authored by EBernhardson).
search_satisfaction: align druid datasource with reality
Thu, Oct 8, 9:38 PM
EBernhardson committed rWDAN48a0d410130d: hdfs_to_druid: Log exact spec sent to druid (authored by EBernhardson).
hdfs_to_druid: Log exact spec sent to druid
Thu, Oct 8, 9:38 PM
EBernhardson committed rWDAN3b114434d7fe: search_satisfaction: Alias sample multiplier to expected name (authored by EBernhardson).
search_satisfaction: Alias sample multiplier to expected name
Thu, Oct 8, 7:30 PM
EBernhardson added a comment to T264632: [Epic] Longer term plan for increases in Elasticsearch cluster disk IO.

Perhaps a silly thought, but one problem we have is that it's hard to directly measure how much cache memory we need. Since we have swap turned off on these machines we can simulate having less memory by having some other application simply allocate the memory (have python create a 10GB string). Can start at a low value and increase a couple GB at a time until IO starts climbing. We would then have a direct measure of how much memory was required (at that moment in time).

Thu, Oct 8, 6:05 PM · Epic, Discovery-Search
EBernhardson committed rWDAN945e5c141bd6: Set search satisfaction dag start date to oldest currently available data (authored by EBernhardson).
Set search satisfaction dag start date to oldest currently available data
Thu, Oct 8, 5:57 PM

Wed, Oct 7

EBernhardson committed rWDAN7fa787ed5f46: mjolnir: Reduce training dataset by 10% (authored by EBernhardson).
mjolnir: Reduce training dataset by 10%
Wed, Oct 7, 8:49 PM
EBernhardson added a comment to T237364: Write Glent M0 A/B test report.

The relevant metrics we have stored are in superset: https://superset.wikimedia.org/r/334

Wed, Oct 7, 8:37 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson created P12950 (An Untitled Masterwork).
Wed, Oct 7, 6:50 PM
EBernhardson updated the task description for T257118: Beta cluster has reached its quota.
Wed, Oct 7, 12:47 AM · Beta-Cluster-Infrastructure

Tue, Oct 6

EBernhardson added a comment to T264632: [Epic] Longer term plan for increases in Elasticsearch cluster disk IO.

To really have a plan, I would like to know the trajectory of our hot set. Unfortunately, I'm not really sure how to measure the hot set. The only data points i can think of at the moment are times when we ran out of IO, and estimating from there. Also these "total cache" numbers aren't exactly the cache size, rather the estimate of referenced mmap pages (vs loaded but unreferenced) at the point when our servers started to have issues.

Tue, Oct 6, 3:27 PM · Epic, Discovery-Search

Mon, Oct 5

EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

restore wikidatawiki_content enwiki_content enwiki_general and commonswiki_file to default index.merge.policy.deletes_pct_allowed on eqiad cirrus cluster

Mon, Oct 5, 10:03 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

Applied P5883 from the previous incident to elastic2025-55 (56+ have 256M memory and aren't having issues). Aggregate read rate across codfw declined from 3GB/s to 500MB/s. This program basically turns off readahead for all currently open files of the elasticsearch process.

Mon, Oct 5, 9:59 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

While we might want to let it run a day or two to have more confidence, but the immediate results of reducing readahead looks like a win. For reference the changes applied were 64kB for 2050, 32kB for 2051, and 16kB for 2052. These all show a dramatic reduction in IO, 2052 is the most dramatic, going from 200+ MB/s down to 30 MB/s of IO required. iops are slightly down, from ~2.75k to ~2k. This suggests most of the IO reduction comes from the smaller size of requests, but also less requests are being performed as a result of less unused pages going into the page cache.

Mon, Oct 5, 9:02 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

elastic2050 to take reduced (128kB) readahead settings

Mon, Oct 5, 8:14 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

One final option we can consider, the current readahead of 256 512-byte sectors (=128kB) is entirely arbitrary[1]. We had some servers with a large value that were performing poorly, and servers with this value that were performing acceptably. We could try cutting this in half.

Mon, Oct 5, 7:45 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

Increase in cache duration will need a few days to take full effect. After that we can review cache metrics to verify we achieved the desired effect of reducing the more_like query rate, along with verifying that effects a reduction in our IO usage. Looking for elastic2025-54 to get much closer to elastic2056-60 (which have more memory).

Mon, Oct 5, 7:03 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T263073: Large, steady increase in unprocessed cloudelastic job.cirrusSearchElasticaWrite messages.

The backlog of jobs has fallen out of the job queue retention window, basically the timespan has been lost. We either need to re-import the indices to cloudelastic or accept waiting 2 months for the saneitizer to work its way through (once we turn it back on, which is waiting for the above patch to repair the envoy deployment).

Mon, Oct 5, 6:57 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson claimed T264053: Unsustainable increases in Elasticsearch cluster disk IO.
Mon, Oct 5, 5:05 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

Followup on merge scheduler changes, the four indices that were updated with 20% deleted pct in eqiad:

Mon, Oct 5, 5:01 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson updated subscribers of T264566: Redirected entity still present in search results after 6 months.

The process that fixes these was turned off at wikidata's request. Mostly that means if somehow a delete is missed it will never be fixed.

Mon, Oct 5, 3:27 PM · Discovery-Search (Current work), Elasticsearch, Wikidata

Fri, Oct 2

EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

No particular changes observed updating the max merge threads. Not particularly unexpected, nothing was complaining about throttling but seemed easy to test.

Fri, Oct 2, 9:30 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson committed rWDAN54396682b455: oozie: query_clicks needs to look for events in codfw during switchover (authored by EBernhardson).
oozie: query_clicks needs to look for events in codfw during switchover
Fri, Oct 2, 6:14 PM
EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

As a thought, this is an in-memory cache but due to edge caching we don't hit it all that often (hit rate 50-70%, good but nothing crazy). Maybe there is a disk-backed cache we can swap in instead of wan object cache, in which case 40G would be inconsequential.

Fri, Oct 2, 4:50 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

Decided there isn't much risk in changing the merge scheduler on the idle cluster, so i've updated our four largest indices (commonsiwki_file, wikidatawiki_content, enwiki_general, enwiki_content) to have merge.scheduler.max_thread_count = 4 on the eqiad cluster. Some stats about these indices recorded below to compare after the weekend.

Fri, Oct 2, 4:43 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

David mentioned the deleted docs count, and I've been somewhat suspicious of that as well. Reviewing our config, we override the default merge configuration to use a single thread (instead of default 4 per 6.5.4 docs). This is per-shard, so a single instance may still perform many parallel merges. Looking into the relevant elasticsearch bits they create info level logs when throttling merge requests and i can't find any matching logs. Likely merge isn't an issue, but not 100% on that.

Fri, Oct 2, 1:37 AM · Patch-For-Review, Discovery-Search (Current work)

Thu, Oct 1

EBernhardson committed rWDAN6101b5672632: Increase training memory overhead (authored by EBernhardson).
Increase training memory overhead
Thu, Oct 1, 11:48 PM
EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

Increase in cache duration will need a few days to take full effect. After that we can review cache metrics to verify we achieved the desired effect of reducing the more_like query rate, along with verifying that effects a reduction in our IO usage. Looking for elastic2025-54 to get much closer to elastic2056-60 (which have more memory).

Thu, Oct 1, 8:18 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

Overall, 200M pages and 2kb gives us 400GB.

Thu, Oct 1, 1:24 AM · Patch-For-Review, Discovery-Search (Current work)

Wed, Sep 30

EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

Increase cache on more_like queries from the current 24h to 3 or maybe 7 days.

Wed, Sep 30, 10:53 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

From our meeting this morning

Wed, Sep 30, 10:24 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson updated the title for P12863 sample of morelike recommendations on enwiki sept 9-16 from untitled to sample of morelike recommendations on enwiki sept 9-16.
Wed, Sep 30, 9:32 PM
EBernhardson created P12863 sample of morelike recommendations on enwiki sept 9-16.
Wed, Sep 30, 9:29 PM
EBernhardson added a comment to T263841: RFC: Expand API title generator to support other generated data.

One potential question, is this going to be a generic implementation where all generators suddenly expose their properties, or are we expecting to implement this per-generator where it's requested?

Wed, Sep 30, 6:49 PM · Platform Engineering, Structured-Data-Backlog (Current Work), TechCom-RFC
EBernhardson added a comment to T263781: data request: Search Interleaving Dataset.

the analysis portion of the linked ab test: https://github.com/wikimedia-research/SD-search-analysis-2020/blob/master/T261759%20-%20AB-test%20analysis.ipynb

Wed, Sep 30, 6:12 PM · Discovery-Search (Current work)
EBernhardson created P12859 (An Untitled Masterwork).
Wed, Sep 30, 3:50 PM
EBernhardson added a comment to T258738: Build query-clicks dataset from SearchSatisfaction logging.

I moved this back to ready for development to reflect reality, some work has been done on migrating the existing hql to spark but various production issues keep interrupting and taking precedence.

Wed, Sep 30, 2:51 AM · Discovery-Search (Current work)

Tue, Sep 29

EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

From talking to @dcausse this morning and further investigation, the shard count increases deployed in july seem to have exacerbated the situation. I'm not convinced this is the start of our problems though. From above, we were already seeing 50+MB/sec of read while we expect closer to ~10MB/sec when the hot set fits in memory. Wish we had more historical data on this, but it seems to be the limits of our prometheus retention.

Tue, Sep 29, 10:31 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

Data sizes are up quite a bit[1] since june (as far back as we have history). Data filesystem usage as reported by elasticsearch has increased from 17TB in late may, to a peak of 23TB mid september. Interestingly datasizes have decreased slightly to 22TB in the last few weeks. This amounts to a 1/3 increase in data size in only 4 months.

Tue, Sep 29, 10:23 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T263875: Develop a new schema for MediaSearch analytics or adapt an existing one.

If we want to track the interface language that visitors to Special:MediaSearch are using, should we record the value returned from mw.language.getFallbackLanguageChain()? That would provide an array of language codes like [ "de", "en" ], etc. Would it be sufficient to just store the first code, or should we store all of them?

Tue, Sep 29, 8:43 PM · SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Analytics-Radar, Patch-For-Review, Product-Analytics, Structured-Data-Backlog (Current Work), Structured Data Engineering
EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

As additional information enforcing the theory that we are running out of memory, the latest machines we have added have 256G, vs the older 128G machines. Just before the cluster switchover[1] we see 1032 - 1052 performing > 100MB/s of io, while 1053 - 1067 are at ~10MB/s (expected historical steady-state).

Tue, Sep 29, 4:38 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

We are usually approaching daily peak load about now, read rate across the cluster is ~4GB/s, all of the last 7 days peaked at > 6GB/s, suggesting the mitigation of reducing commonswiki queries is having some impact.

Tue, Sep 29, 4:22 PM · Patch-For-Review, Discovery-Search (Current work)