Page MenuHomePhabricator

EBernhardson (EBernhardson)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Oct 7 2014, 4:49 PM (340 w, 5 d)
Availability
Available
LDAP User
EBernhardson
MediaWiki User
EBernhardson (WMF) [ Global Accounts ]

Recent Activity

Thu, Apr 15

EBernhardson added a comment to T280294: Big increase in eventlogging_SearchSatisfaction validation errors after this week's MW train.

It looks like the source of this is I2bcd7305 from T210106. It looks like this changed the logged value from '0 edits' to null for most logged requests. I don't know who is using this info downstream, safest approach seems to be to transform null back into '0 edits'.

Thu, Apr 15, 9:33 PM · MW-1.37-notes (1.37.0-wmf.3; 2021-04-27), Discovery

Wed, Apr 14

EBernhardson added a comment to T279009: Cleanup duplicate indices in cloudelastic.

Pondering this, first step should probably be closing rather than deleting the indices. Closed indices can be easily reopened if we start getting errors from CirrusSearch that we closed an active index. Without errors after some reasonable time period the indices can be safely deleted.

Wed, Apr 14, 7:09 PM · SecTeam-Processed, Discovery-Search, Vuln-Infoleak, Security, Tool-global-search

Tue, Apr 13

EBernhardson moved T278209: MediaSearch results not updated 12 hours after overwriting image from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Tue, Apr 13, 6:22 PM · MW-1.37-notes (1.37.0-wmf.3; 2021-04-27), Discovery-Search (Current work), Structured-Data-Backlog, SDAW-MediaSearch, CirrusSearch
EBernhardson added a comment to T262612: Run an A/B test using suggestions generated using glent Method 1.

Would putting everything in the backend solve the Vue.js problem, or would the frontend still need some tweaking to do the right thing?

Tue, Apr 13, 5:40 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T231517: Investigate and fix GC issues on cloudelastic machines.

Also potentially related:

  • T279009: cloudelastic has an extra outdated copy of every index that needs to be cleaned up
  • T279092: I noted these alerts before going on vacation and made a ticket with a few thoughts. It could perhaps be merged here?
Tue, Apr 13, 4:12 PM · Patch-For-Review, Discovery-Search

Thu, Apr 1

EBernhardson updated subscribers of T279092: Resolve repeated GC alerts from cloudelastic.
Thu, Apr 1, 6:11 PM · Discovery-Search
EBernhardson added a comment to T279092: Resolve repeated GC alerts from cloudelastic.

I suppose the alternate step 1 is restart the jvm's and see if it happens again (it usually does).

Thu, Apr 1, 6:10 PM · Discovery-Search
EBernhardson added a project to T279092: Resolve repeated GC alerts from cloudelastic: Discovery-Search.
Thu, Apr 1, 6:05 PM · Discovery-Search
EBernhardson created T279092: Resolve repeated GC alerts from cloudelastic.
Thu, Apr 1, 6:04 PM · Discovery-Search

Wed, Mar 31

EBernhardson added a comment to T262612: Run an A/B test using suggestions generated using glent Method 1.

Realized the superset dashboard doesn't break down any stats by wiki, and adding that isn't particularly easy. The most important stat is probably the prevalence of mismatch sessions, here is a quick breakdown from hive for a single day:

Wed, Mar 31, 10:09 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T279009: Cleanup duplicate indices in cloudelastic.

This shows using a single wiki as a example, but this is repeated for all of the wikis that are split between omega and psi. Here acewiki correctly does not exist on 9243 (chi). It should not exist on 9443 (omega), but does exist on cloudelastic:9443. It should exist on 9643 (psi) and does in all clusters.

Wed, Mar 31, 8:05 PM · SecTeam-Processed, Discovery-Search, Vuln-Infoleak, Security, Tool-global-search
EBernhardson created T279009: Cleanup duplicate indices in cloudelastic.
Wed, Mar 31, 7:51 PM · SecTeam-Processed, Discovery-Search, Vuln-Infoleak, Security, Tool-global-search

Tue, Mar 30

EBernhardson added a comment to T262612: Run an A/B test using suggestions generated using glent Method 1.

I don't have a solution for the autocomplete problem. Perhaps we need a hybrid solution where buckets are constantly assigned from the backend, and Special:Search auto-magically uses the bucket, but autocomplete requests will still have to include the query string parameter. This feels messy, will ponder more.

Tue, Mar 30, 11:22 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T262612: Run an A/B test using suggestions generated using glent Method 1.

Tracing things back through git history, end up at 1fcba848 from T121542 adding the trigger functionality. The commit message justifys adding ab testing to frontend because textcat was going to need some special query parameters, and this allowed the frontend to provide the single testing parameter instead of using various cirrus debug query params. I found this a bit unclear, so pieced together a bit more of the history.

Tue, Mar 30, 6:52 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T262612: Run an A/B test using suggestions generated using glent Method 1.

I took a sample of 30 complete sessions, joined against cirrus backend logs for information like referers and actual query strings. I reviewed this for inconsistencies, then tried to calculate stats about how prevalent those inconsistencies are in the full dataset of sessions that are in a mismatch state.

Tue, Mar 30, 4:19 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T274583: Reload ORES data into weighted_tags.

Completed a number of namespaces, it's up to 14 now. Taking it's time but looking good.

Tue, Mar 30, 4:16 PM · Patch-For-Review, Discovery-Search (Current work), Growth-Structured-Tasks
EBernhardson added a comment to T278660: Clean up OneStepUserNameQuery/TwoStepUserNameQuery.

@DannyS712 could you provide links to the queries?

Side note, if we can track down the people that made the comments in documentation I suggest subscribing them to this task.

Tue, Mar 30, 3:22 PM · Growth-Team-Filtering, DBA, User-DannyS712, Growth-Team, StructuredDiscussions

Fri, Mar 26

EBernhardson added a comment to T262612: Run an A/B test using suggestions generated using glent Method 1.

David suggested that starting a session without going through autocomplete could perhaps be a source of problems. Looking into specifically sessions starting on enwik iand dewiki, of the sessions that have mismatch events ~75% of those sessions have a mismatch as the first event we see. Of the sessions that have an autocomplete dt prior to a fulltext dt (filtering ac-only), it looks like only 13% of those sessions transition into the mismatch state.

Fri, Mar 26, 8:14 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T262612: Run an A/B test using suggestions generated using glent Method 1.

Poking at the collected data for one day, the number of searches per session looks consistent but plotting the number of sessions by session length is suspicious with s a big bump (on log scale!) in it for sessions ending at around 100 seconds in the mismatch bucket.

Fri, Mar 26, 3:56 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T262612: Run an A/B test using suggestions generated using glent Method 1.

Quickly looked when I saw that frwiki has the new search widget enabled but not dewiki/enwiki. Looking at the data it seems frwiki is heavily affected (~20% of the sessions have an event in mismatch or invalid as opposed to 1%/2% for other wikis):

(period 2021-03-23T00:00:00 to 2021-03-23T04:00:00)

	total_sessions	invalid_sessions	pct_invalid
wiki			
+------+--------------+----------------+-----------+
|  wiki|total_sessions|invalid_sessions|pct_invalid|
+------+--------------+----------------+-----------+
|dewiki|        874925|           16805|       1.92|
|enwiki|       4442523|           98517|       2.22|
|frwiki|         33366|            6423|      19.25|
+------+--------------+----------------+-----------+
Fri, Mar 26, 3:42 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)

Thu, Mar 25

EBernhardson added a comment to T274583: Reload ORES data into weighted_tags.

enwiki ns 0 has completed, ns 1 is working it's way though. Optimistically, looks like this should work out and complete.

Thu, Mar 25, 10:14 PM · Patch-For-Review, Discovery-Search (Current work), Growth-Structured-Tasks
EBernhardson added a comment to T262612: Run an A/B test using suggestions generated using glent Method 1.

Looking at this from more of an events/stats perspective, what can we see is different in the mismatched sessions? I first noticed that for automatically rewritten queries mismatched sessions only see 10% with interaction, but the control and test bucket are around 30%. Similarly mismatch sessions are only rewriting 45% of zero results queries, while the test and control buckets are seeing closer to 60% rewrite rates.

Thu, Mar 25, 7:55 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T262612: Run an A/B test using suggestions generated using glent Method 1.

I've been able to switch enter a mismatched state from incongnito windows multiple times now, but it's not clear what the trigger is. It seems we have two different options: We could try and fix the frontend bucketing, or we previously implemented bucketing in the backend as well. But for some reason i can't remember we quickly transistioned from the backend doing bucketing to doing the bucketing inside the frontend browser code. Perhaps the problem was that the only way thread data arbitrary extra data like a bucket through api responses is to inject them into headers, or something like that.

Thu, Mar 25, 6:47 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)

Wed, Mar 24

EBernhardson added a comment to T262612: Run an A/B test using suggestions generated using glent Method 1.

Counting mismatches as any query in a session that contains mismatched events, we have 42% of sessions and 52% of search requests falling into the mismatched bucket. In some testing in an incognito window, by the time i figured out how to set the breakpoint inside searchSatisfaction.js my subTest was already set to mismatched. Clearly we need to dig into the search satisfaction tracking and figure out whats going on here if we want to have usable AB test results.

Wed, Mar 24, 10:18 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T262612: Run an A/B test using suggestions generated using glent Method 1.

Data is suspicious. The mismatch bucket, which has searches where the testing bucket reported by the backend is different than the frontend expected, Is 44% of all search requests. The backend aggregation looks to be a bit optimistic here as well, the reported bucket is whichever test it saw first (unordered) on a per-session basis, rather than if that particular search reported a mismatch. I'm currently testing a patch that will mark a full session as mismatch if any event in the session is a mismatch, should hopefully get a better idea of the scope of the issue.

Wed, Mar 24, 8:19 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a reverting change for rWDAN3fd7d7b9d737: set mediawiki_active_datacenter to codfw: rWDAN0be522e69957: Revert "set mediawiki_active_datacenter to codfw".
Wed, Mar 24, 7:25 PM
EBernhardson committed rWDAN0be522e69957: Revert "set mediawiki_active_datacenter to codfw" (authored by EBernhardson).
Revert "set mediawiki_active_datacenter to codfw"
Wed, Mar 24, 7:25 PM
EBernhardson added a comment to T274220: Populate MachineVision databases for images commonly returned by search.

database is imported to ebernhardson.machine_vision_safe_search/date=20200323, haven't had a chance to dig into it yet.

Wed, Mar 24, 7:16 PM · Discovery-Search (Current work), Structured-Data-Backlog, MachineVision

Tue, Mar 23

EBernhardson committed rWDAN3fd7d7b9d737: set mediawiki_active_datacenter to codfw (authored by EBernhardson).
set mediawiki_active_datacenter to codfw
Tue, Mar 23, 10:38 PM
EBernhardson committed rWDAN32eed1905608: airflow: Partition ores export tasks by namespace (authored by EBernhardson).
airflow: Partition ores export tasks by namespace
Tue, Mar 23, 10:38 PM
EBernhardson added a comment to T274583: Reload ORES data into weighted_tags.

Reworked exports so we can run a task per namespace. Triggered a new run of the ores_predictions_bulk_ingest dag and manually marked all the articletopic tasks as success so it skips them and only does the drafttopic. Now waiting for it to complete.

Tue, Mar 23, 10:32 PM · Patch-For-Review, Discovery-Search (Current work), Growth-Structured-Tasks
EBernhardson moved T274583: Reload ORES data into weighted_tags from Needs review to Waiting on the Discovery-Search (Current work) board.
Tue, Mar 23, 10:32 PM · Patch-For-Review, Discovery-Search (Current work), Growth-Structured-Tasks
EBernhardson moved T276385: [Log noise] "Prefix search request was longer than the maximum allowed length." from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Tue, Mar 23, 6:58 PM · MW-1.36-notes (1.36.0-wmf.35; 2021-03-16), Discovery-Search (Current work), Discovery, CirrusSearch, Wikimedia-production-error
EBernhardson moved T269493: Add hasrecommendation: search keyword from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Tue, Mar 23, 6:58 PM · MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), Growth-Team (Current Sprint), Add-Link, Image-Recommendations, Discovery-Search (Current work), CirrusSearch
EBernhardson moved T277332: Uncaught Error: Widget not found / Call to a member function getNsIndex() on null on CirrusSearch result page with internal error from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Tue, Mar 23, 6:54 PM · MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), Discovery-Search (Current work), CirrusSearch, Wikimedia-production-error
EBernhardson added a comment to T262612: Run an A/B test using suggestions generated using glent Method 1.

Test is started, results will be found in the superset Search Query Suggestions dashboard. Data is loaded into this dashboard daily, with the prior days data arriving around 3:00 UTC. The test will be run for 7 days, assuming data collection looks reasonable that means turning off next Monday.

Tue, Mar 23, 6:27 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)
EBernhardson moved T262612: Run an A/B test using suggestions generated using glent Method 1 from Needs review to Waiting on the Discovery-Search (Current work) board.
Tue, Mar 23, 6:13 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)

Mon, Mar 22

EBernhardson added a comment to T277213: Eliminate old M2 suggestions with improper tokenization.

I've spent some time friday and again today looking at the queries found in the second csv but not the first. Everything i've looked at (only a few dozen) seems reasonable on closer inspection. The only particularly suspicious thing is there are a class of queries that don't have a dym in the rerun, but when i run them through the test suite they provide the expected suggestion. Since the suggestion algo seems to still be correct i put together a test case that runs the whole suggester, simulating the input dataframes, but still looks reasonable. I'm not really finding an answer, tempted to call it "good enough".

Mon, Mar 22, 8:36 PM · Patch-For-Review, Discovery-Search (Current work), Chinese-Sites

Mar 18 2021

EBernhardson added a comment to T258738: Build query-clicks dataset from SearchSatisfaction logging.

From discussions:

Mar 18 2021, 11:42 PM · Discovery-Search (Current work)
EBernhardson added a comment to T277213: Eliminate old M2 suggestions with improper tokenization.

First off, 俸納 doesn't get any suggestions in Chinese, but it does get 奉納 as a suggestion in Japanese. They use the same confusion tables, but different frequency tables. Which brings up a few thoughts:

Could you re-generate glent_m2_rerun_filtered.csv with the language used?

Mar 18 2021, 11:42 PM · Patch-For-Review, Discovery-Search (Current work), Chinese-Sites
EBernhardson claimed T262612: Run an A/B test using suggestions generated using glent Method 1.
Mar 18 2021, 7:13 PM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)

Mar 17 2021

EBernhardson added a comment to T274220: Populate MachineVision databases for images commonly returned by search.

All imports have completed. Next step is to re-run the previous work joining the datasets and verify we now have an acceptable percentage of queries with predictions.

Mar 17 2021, 11:32 PM · Discovery-Search (Current work), Structured-Data-Backlog, MachineVision
EBernhardson moved T277213: Eliminate old M2 suggestions with improper tokenization from In Progress to Needs review on the Discovery-Search (Current work) board.
Mar 17 2021, 11:20 PM · Patch-For-Review, Discovery-Search (Current work), Chinese-Sites
EBernhardson moved T277213: Eliminate old M2 suggestions with improper tokenization from Ready for Development to In Progress on the Discovery-Search (Current work) board.
Mar 17 2021, 11:18 PM · Patch-For-Review, Discovery-Search (Current work), Chinese-Sites
EBernhardson claimed T277213: Eliminate old M2 suggestions with improper tokenization.
Mar 17 2021, 11:18 PM · Patch-For-Review, Discovery-Search (Current work), Chinese-Sites
EBernhardson claimed T277332: Uncaught Error: Widget not found / Call to a member function getNsIndex() on null on CirrusSearch result page with internal error.
Mar 17 2021, 7:46 PM · MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), Discovery-Search (Current work), CirrusSearch, Wikimedia-production-error
EBernhardson moved T277332: Uncaught Error: Widget not found / Call to a member function getNsIndex() on null on CirrusSearch result page with internal error from Ready for Development to Needs review on the Discovery-Search (Current work) board.
Mar 17 2021, 7:46 PM · MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), Discovery-Search (Current work), CirrusSearch, Wikimedia-production-error
EBernhardson added a comment to T277213: Eliminate old M2 suggestions with improper tokenization.

I tried reviewing some of the changes, particularly the 7,870 queries that used to have suggestions but no longer do, and it's not clear to me. For example 俸納 used to suggest 奉納 but doesn't any more. This doesn't seem to match the patterns we are dealing with here, but perhaps that was a previous bug that was fixed and we are only now doing a rerun of historical m2?

Mar 17 2021, 7:02 PM · Patch-For-Review, Discovery-Search (Current work), Chinese-Sites

Mar 16 2021

EBernhardson added a comment to T277213: Eliminate old M2 suggestions with improper tokenization.

I'm not entirely sure if it's correct, but in theory ebernhardson.glent_suggestions/algo=m2run/date=20210313 should contain all the queries in the regular m2run history, but re-run with the updated algorithm. This is mostly a naive re-shaping of the historical data to look like it's a log and processing it that way.

Mar 16 2021, 9:42 PM · Patch-For-Review, Discovery-Search (Current work), Chinese-Sites
EBernhardson moved T265914: Investigate Resource Needs for Commons and Wikidata Elasticsearch indices from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Mar 16 2021, 7:01 PM · Discovery-Search (Current work)
EBernhardson updated subscribers of T258738: Build query-clicks dataset from SearchSatisfaction logging.

@zpapierski @dcausse Does the above seem to cover what we want from this dataset?

Mar 16 2021, 5:58 PM · Discovery-Search (Current work)
EBernhardson added a comment to T274583: Reload ORES data into weighted_tags.

Articletopic should be fully loaded into prod now, the ores_articletopics and the weighted_tags fields. We will have to decide if we are going to push through drafttopic and refactor the orchestration into smaller pieces that don't retry on a week-long window.

Mar 16 2021, 3:36 PM · Patch-For-Review, Discovery-Search (Current work), Growth-Structured-Tasks
EBernhardson added a comment to T277332: Uncaught Error: Widget not found / Call to a member function getNsIndex() on null on CirrusSearch result page with internal error.

This seems to be a misunderstanding in review, CirrusSearch uses ContLang both for local-wiki and cross-wiki behavior. When the wiki is remote Cirrus provides ContLang, but when the wiki is local it depends on the global variable to already exist. I suspect the misunderstanding revolves around how cirrus queries indices and returns results for any wiki in the cluster, and not only the local wiki being queried.

Mar 16 2021, 3:03 PM · MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), Discovery-Search (Current work), CirrusSearch, Wikimedia-production-error
EBernhardson moved T262612: Run an A/B test using suggestions generated using glent Method 1 from Waiting to Needs review on the Discovery-Search (Current work) board.
Mar 16 2021, 12:05 AM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T262612: Run an A/B test using suggestions generated using glent Method 1.

The above set of patches should start the test. The first two we should merge and deploy soon-ish, before actually starting the test we will want to run some test queries against prod and make sure it looks as we expect.

Mar 16 2021, 12:04 AM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)

Mar 15 2021

EBernhardson moved T276571: selenium-daily-beta-CirrusSearch is failing from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Mar 15 2021, 11:37 PM · MW-1.36-notes (1.36.0-wmf.35; 2021-03-16), Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T265081: Fix Glent M2 CJK suggestion tokenization .

Seems we are about ready, should i run a release on glent and update airflow with the new jar?

Mar 15 2021, 11:35 PM · Discovery-Search (Current work), Chinese-Sites
EBernhardson added a comment to T258738: Build query-clicks dataset from SearchSatisfaction logging.

The following is hardly comprehensive, but i've tried to collect together information from various sources about what this dataset is. The overall goal seems to be: Provide simplified and unified access to full-text search queries issued to CirrusSearch

Mar 15 2021, 11:08 PM · Discovery-Search (Current work)
EBernhardson added a comment to T274583: Reload ORES data into weighted_tags.

articletopic dumps have been processed and uploaded to swift. This includes updates for ~35M pages and will likely take a day or two to make it through the indexing pipeline.

Mar 15 2021, 10:06 PM · Patch-For-Review, Discovery-Search (Current work), Growth-Structured-Tasks
EBernhardson committed rWDAN4300929301dc: convert_to_esbulk: Accept partial hour timestamps (authored by EBernhardson).
convert_to_esbulk: Accept partial hour timestamps
Mar 15 2021, 8:58 PM
EBernhardson committed rWDAN82e0654d1841: prepare_rev_score: Rename scores_export to bulk_ingest (authored by EBernhardson).
prepare_rev_score: Rename scores_export to bulk_ingest
Mar 15 2021, 7:56 PM
EBernhardson committed rWDAN05e42b01687e: airflow tox: Require sqlalchemy < 1.4.0 (authored by EBernhardson).
airflow tox: Require sqlalchemy < 1.4.0
Mar 15 2021, 7:52 PM
EBernhardson added a comment to T274583: Reload ORES data into weighted_tags.

The script finished, but the processing framework OOM'd while finishing up and putting everything where it belongs. For now I'm bypassing the drafttopic dump which will allow articletopic to ship to the cluster. To run drafttopic we will need a minor refactor of the orchestration to partition the intermediate data by namespace and re-run drafttopic one namespace at a time.

Mar 15 2021, 7:44 PM · Patch-For-Review, Discovery-Search (Current work), Growth-Structured-Tasks
EBernhardson added a comment to T277332: Uncaught Error: Widget not found / Call to a member function getNsIndex() on null on CirrusSearch result page with internal error.

Reasonable chance this is because we are pulling ContLang from SearchConfig, ContLang was never configuration and might no longer be accessible that way.

Mar 15 2021, 3:56 PM · MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), Discovery-Search (Current work), CirrusSearch, Wikimedia-production-error

Mar 12 2021

EBernhardson added a comment to T274583: Reload ORES data into weighted_tags.

It's still running. Looks like it's requested just under 34M of the expected ~40M predictions, with a current runtime of ~110 hours. Once the dump finishes it should automatically be processed and uploaded to the production clusters.

Mar 12 2021, 7:54 PM · Patch-For-Review, Discovery-Search (Current work), Growth-Structured-Tasks
EBernhardson added a comment to T277213: Eliminate old M2 suggestions with improper tokenization.

All seems reasonable to me.

Mar 12 2021, 7:24 PM · Patch-For-Review, Discovery-Search (Current work), Chinese-Sites
EBernhardson added a comment to T269493: Add hasrecommendation: search keyword.

classification.ores.articletopic/History and Society.Politics and government|0.85337919734571

I wonder if this breaks anything, the value after | should be an integer between 1 and 1000 (untested, but suspicious).

That's my fault, I used a script to import production ORES scores for a bunch of testwiki articles, but forgot to scale up the scores. I can fix the index data if it's worth the effort.

Mar 12 2021, 6:34 PM · MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), Growth-Team (Current Sprint), Add-Link, Image-Recommendations, Discovery-Search (Current work), CirrusSearch
EBernhardson updated subscribers of T269493: Add hasrecommendation: search keyword.

classification.ores.articletopic/History and Society.Politics and government|0.85337919734571

I wonder if this breaks anything, the value after | should be an integer between 1 and 1000 (untested, but suspicious).

Mar 12 2021, 4:09 PM · MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), Growth-Team (Current Sprint), Add-Link, Image-Recommendations, Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T269493: Add hasrecommendation: search keyword.

classification.ores.articletopic/History and Society.Politics and government|0.85337919734571

Mar 12 2021, 3:56 PM · MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), Growth-Team (Current Sprint), Add-Link, Image-Recommendations, Discovery-Search (Current work), CirrusSearch

Mar 11 2021

EBernhardson moved T264053: Unsustainable increases in Elasticsearch cluster disk IO from Waiting to Needs Reporting on the Discovery-Search (Current work) board.
Mar 11 2021, 7:51 PM · Discovery-Search (Current work)
EBernhardson removed a project from T264053: Unsustainable increases in Elasticsearch cluster disk IO: Patch-For-Review.

After having sat with this some time, the metrics look reasonably happy since mid october when above mitigations were applied. Since the T271493 found what was likely the underlying cause of the increase in working site size, also mitigated. The only remaining thing is to re-enable sister search for commonswiki, a sub-task has been created and this can be closed.

Mar 11 2021, 7:51 PM · Discovery-Search (Current work)
EBernhardson created T277225: Reenable commonswiki sister search.
Mar 11 2021, 7:51 PM · Discovery-Search (Current work)
EBernhardson added a comment to T276169: Don't make DYM suggestions with negation in them (Glent).

In theory we could do something like run !xnxx ignoring syntax, although that code path doesn't exist today. I feel like that makes lots of things more complicated though, pains are taken to try and ensure the query that was submitted and is represented in the logs is what we run when suggesting it to another user. Not just the query string, but also the context of the query. Suggesting a query to a user based on historical behavior, but then running the suggestion such that it's no longer the same query as was recorded, seems incorrect.

Mar 11 2021, 7:05 PM · Discovery-Search
EBernhardson moved T276385: [Log noise] "Prefix search request was longer than the maximum allowed length." from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Mar 11 2021, 5:12 PM · MW-1.36-notes (1.36.0-wmf.35; 2021-03-16), Discovery-Search (Current work), Discovery, CirrusSearch, Wikimedia-production-error

Mar 10 2021

EBernhardson added a comment to T277045: The parameter srbackend is no more effective in Special:Search.

Looking at this it seems like setting searchEngineType so late is unintentional. Moving it to the top of load() seems the most sensible, it matches the purpose stated in the doc comment. Might be nice to also add a test case that verifies the appropriate search engine is chosen.

Mar 10 2021, 7:35 PM · MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), MediaWiki-Search, Discovery-Search
EBernhardson claimed T276385: [Log noise] "Prefix search request was longer than the maximum allowed length.".
Mar 10 2021, 1:24 AM · MW-1.36-notes (1.36.0-wmf.35; 2021-03-16), Discovery-Search (Current work), Discovery, CirrusSearch, Wikimedia-production-error
EBernhardson moved T276385: [Log noise] "Prefix search request was longer than the maximum allowed length." from Ready for Development to Needs review on the Discovery-Search (Current work) board.
Mar 10 2021, 1:24 AM · MW-1.36-notes (1.36.0-wmf.35; 2021-03-16), Discovery-Search (Current work), Discovery, CirrusSearch, Wikimedia-production-error
EBernhardson added a comment to T274220: Populate MachineVision databases for images commonly returned by search.

So far 352 files have processed, with 115 remaining.

Mar 10 2021, 1:00 AM · Discovery-Search (Current work), Structured-Data-Backlog, MachineVision
EBernhardson added a comment to T276385: [Log noise] "Prefix search request was longer than the maximum allowed length.".

While pondering a fix, i wonder if cirrussearch really even needs to throw an exception here. Page titles are well documented as limited to 255 bytes. In other cases where we receive queries with no possible answer we simply return no results as the correct answer. That seems sane here, and if we want to provide a better UX in this narrow case (i suspect these are mostly automated and it helps the rare human) that can be implemented in a sensible layer.

Mar 10 2021, 12:17 AM · MW-1.36-notes (1.36.0-wmf.35; 2021-03-16), Discovery-Search (Current work), Discovery, CirrusSearch, Wikimedia-production-error
EBernhardson claimed T274220: Populate MachineVision databases for images commonly returned by search.
Mar 10 2021, 12:06 AM · Discovery-Search (Current work), Structured-Data-Backlog, MachineVision
EBernhardson moved T274220: Populate MachineVision databases for images commonly returned by search from Ready for Development to In Progress on the Discovery-Search (Current work) board.
Mar 10 2021, 12:05 AM · Discovery-Search (Current work), Structured-Data-Backlog, MachineVision

Mar 9 2021

EBernhardson added a comment to T276169: Don't make DYM suggestions with negation in them (Glent).

A cutoff wouldn't be too hard to add, we can accept some timestamp in the cli arguments and then apply it. I suppose we could deploy and let it run once with a shorter timespan, then update the calling pieces to start providing -15 months.

Mar 9 2021, 9:01 PM · Discovery-Search
EBernhardson updated subscribers of T265914: Investigate Resource Needs for Commons and Wikidata Elasticsearch indices .
Mar 9 2021, 7:34 PM · Discovery-Search (Current work)
EBernhardson added a comment to T276169: Don't make DYM suggestions with negation in them (Glent).

The suggestion table includes a ts which is the max timestamp seen of seeing this (query, dym) pair as the top pairing of the given query. Looking at only the rows that suggest '!xnxx', the most recent timestamp is 2020-4-24T23:37:47Z . Basically, yes they are old and it seems likely we are now filtering these. Two things occur to me while looking at this:

Mar 9 2021, 7:06 PM · Discovery-Search
EBernhardson created P14709 (An Untitled Masterwork).
Mar 9 2021, 6:31 PM

Mar 8 2021

EBernhardson added a comment to T274583: Reload ORES data into weighted_tags.

Same thing, the node it was running on was taken down for reimage this morning. It's now running on a host that's already been reimaged, letting it try again.

Mar 8 2021, 10:52 PM · Patch-For-Review, Discovery-Search (Current work), Growth-Structured-Tasks
EBernhardson added a comment to T265914: Investigate Resource Needs for Commons and Wikidata Elasticsearch indices .

I completely forgot about how it will behave while reindexing. Indeed we must account for two full copies of any one index as part of normal operations. While reviewing the previous order to see if it would make sense to add disks to these machines, I noticed the above spec for a recent server build is incorrect. Specifically the most recent machines are 2x1.75TB disks, with 3.4T usable on /srv. That brings 10 servers up to 34TB, matching the identified need.

Mar 8 2021, 10:31 PM · Discovery-Search (Current work)

Mar 5 2021

EBernhardson committed rWDAN4cc913ecf651: Correct refinery-drop-older-than checksum (authored by EBernhardson).
Correct refinery-drop-older-than checksum
Mar 5 2021, 12:47 AM

Mar 4 2021

EBernhardson added a comment to T274321: relforge: discuss possible PII concerns with relforge data.

Simple summary of current relforge usage by the analytics network (and/or elsewhere if there's any other usage), and ideally a super-high level description of the flow of data

Mar 4 2021, 8:47 PM · Discovery-Search (Current work)
EBernhardson moved T273847: Create a elasticsearch/kibana index with queries to allow query completion candidate research from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Mar 4 2021, 8:01 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson created T276492: Notifications when prometheus daemons are wedged.
Mar 4 2021, 7:04 PM · observability, Discovery-Search
EBernhardson added a comment to T274583: Reload ORES data into weighted_tags.

The node the job was running on was taken down for a reimage, it has restarted on another host.

Mar 4 2021, 6:30 PM · Patch-For-Review, Discovery-Search (Current work), Growth-Structured-Tasks

Mar 3 2021

EBernhardson committed rWDAN200f5d621f0a: Fix search satisfaction loading into druid (authored by EBernhardson).
Fix search satisfaction loading into druid
Mar 3 2021, 11:48 PM
EBernhardson committed rWDAN7f37d40ae43f: Replace refinery-drop-hive-partitions with refinery-drop-older-then (authored by EBernhardson).
Replace refinery-drop-hive-partitions with refinery-drop-older-then
Mar 3 2021, 9:53 PM
EBernhardson moved T265914: Investigate Resource Needs for Commons and Wikidata Elasticsearch indices from In Progress to Needs review on the Discovery-Search (Current work) board.
Mar 3 2021, 12:56 AM · Discovery-Search (Current work)
EBernhardson added a comment to T265914: Investigate Resource Needs for Commons and Wikidata Elasticsearch indices .

Looked into the dashboards we created to see the state is today, and how that data has changed since late november when we deployed stats collection:

Mar 3 2021, 12:55 AM · Discovery-Search (Current work)

Mar 2 2021

EBernhardson added a comment to T274583: Reload ORES data into weighted_tags.

Restarted the dump after deploying change to error_threshold, it was only ~4 hours into the run since the last fail. The last fail was:

Mar 2 2021, 8:43 PM · Patch-For-Review, Discovery-Search (Current work), Growth-Structured-Tasks
EBernhardson added a comment to T271493: Implement 50kb limit on file text indexing for to reduce increasing commonswiki_file on-disk size.

merge finished, minimal impact. It only triggered ~700GB of merges in 10.5T worth of shards, not a large enough % to have a significant effect.

Mar 2 2021, 8:09 PM · MW-1.36-notes (1.36.0-wmf.27; 2021-01-19), Patch-For-Review, Commons, Discovery-Search (Current work)
EBernhardson moved T274314: relforge: open up access to relforge100[3,4] from Ready for Development to Needs Reporting on the Discovery-Search (Current work) board.
Mar 2 2021, 6:28 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson added a comment to T274314: relforge: open up access to relforge100[3,4].

Fixed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/666775

Mar 2 2021, 6:27 PM · Patch-For-Review, Discovery-Search (Current work)