Page MenuHomePhabricator

EBernhardson (EBernhardson)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Oct 7 2014, 4:49 PM (281 w, 1 d)
Availability
Available
LDAP User
EBernhardson
MediaWiki User
EBernhardson (WMF) [ Global Accounts ]

Recent Activity

Tue, Feb 25

EBernhardson added a comment to T246177: Track maintenance operations for data pipelines.

Oozie complained about discovery-query-clicks-hourly multiple times, typical. Ran P10525 (fix-oozie.sh)

Tue, Feb 25, 11:25 PM · User-EBernhardson, Discovery-Search (Current work)
EBernhardson created P10525 fix-oozie.sh.
Tue, Feb 25, 11:23 PM
EBernhardson created T246177: Track maintenance operations for data pipelines.
Tue, Feb 25, 11:21 PM · User-EBernhardson, Discovery-Search (Current work)
EBernhardson added a comment to T245677: Reader searches with romanized version of non-Latin script.

We've (@TJones) talked about this in the past, but it never made it high enough up the priority list. Essentially the existing language detection code can be re-purposed to detect the language of "hebrew but transliterated to qwerty", after which it can transliterate and run a second-try search (the "Showing results for washington. No results found for Washingtxn") if the first search has poor enough results. There is nothing ground breaking here, but it would have to be prioritized as it will take some time to work out properly without simply doubling the query load for certain languages.

Tue, Feb 25, 4:34 PM · Discovery-Search, Core Platform Team Workboards (Green), Story, MediaWiki-REST-API, CPT Initiatives (Core REST API in PHP)
EBernhardson committed rEWCSdc7440e21d9e: Only instantiate HitHandler if needed (authored by EBernhardson).
Only instantiate HitHandler if needed
Tue, Feb 25, 8:02 AM

Mon, Feb 24

EBernhardson created P10505 (An Untitled Masterwork).
Mon, Feb 24, 8:43 PM
EBernhardson moved T244736: Migrate Elasticsearch to Debian Buster from needs triage to Ops / SRE on the Discovery-Search board.
Mon, Feb 24, 6:02 PM · Operations, Discovery-Search

Fri, Feb 21

EBernhardson added a comment to T243357: Once the ORES drafttopic - ElasticSearch pipeline is set up, update data about all articles.

It wouldn't be too hard to adjust mw_prepare_rev_score.py to source it's data from some other source, at some point in that script we have a DataFrame containing three fields, (wikiid, page_id, dict from label to prediction probability)`. It sounds like we should be able to generate a dataset in that format from oresapi, for a one-time script i can hack reading that in relatively easily. Simplest format for exchange is probably gzip'd files containing a json row per line. If the dataset is large these can be split across multiple files.

Fri, Feb 21, 4:32 PM · Scoring-platform-team, Discovery-Search, Growth-Team (Current Sprint), NewcomerTasks 1.1

Thu, Feb 20

EBernhardson added a comment to T240550: Add mapping for ORES topic field in ElasticSearch.

The reindex wont block anything, essentially elasticsearch will store all the data we send to it, but it's only searchable once the reindex process makes it to that wiki.

Thu, Feb 20, 6:14 PM · MW-1.35-notes (1.35.0-wmf.15; 2020-01-14), Discovery-Search (Current work)
EBernhardson added a comment to T244297: Newcomer tasks: set initial thresholds for ORES articletopic.

The per-topic thresholding is now deployed. I ran only the thresholding and extraction portion of last weeks job to see how it would look. This will do a full run, where the predictions are also shipped to elasticsearch, on sunday (feb 23rd).

Thu, Feb 20, 5:54 PM · Scoring-platform-team (Current), Discovery-Search (Current work), Growth-Team (Current Sprint)
EBernhardson added a comment to T240550: Add mapping for ORES topic field in ElasticSearch.

This reindex process is running, will probably finish late next week

Thu, Feb 20, 5:19 PM · MW-1.35-notes (1.35.0-wmf.15; 2020-01-14), Discovery-Search (Current work)
EBernhardson added a comment to T240702: mediawiki.job.cirrusSearchElasticaWrite topics need more partitions!.

Reviewing how our job queue usage is going, the pre-partitioned queue here backlogs fairly significantly, up to ~500k messages, while the post-partitioned queue only backlogs when the consumer decides to stop reading for 10-30 minutes (separate ticket, T224425). As some approximate stats, on 2020-02-20 08:00-09:00 UTC the commited offset increment of the partitioner went up by ~3M, and the peak backlog over this period was ~500k jobs. This is around 850 jobs/sec, which puts 500k jobs at a 10 minute backlog. This happens for an hour every two hours when the scheduled jobs queue up.

Thu, Feb 20, 5:13 PM · Core Platform Team Workboards (Clinic Duty Team), Discovery-Search
EBernhardson added a comment to T245203: Create production and canary releases for existent eventgate helmfile services.

Re-deployed our glent esbulk oozie job against refinery versioned 2020-02-19T16.58.16+00.00--scap_sync_2020-02-19_0001. Additionally shipped an update to our airflow scheduler that changes the eventgate port used there as well.

Thu, Feb 20, 12:05 AM · Patch-For-Review, Analytics-Kanban, Analytics, serviceops

Wed, Feb 19

EBernhardson committed rWDAN5ad38f6d6638: ores articletopic: per-topic thresholding (authored by EBernhardson).
ores articletopic: per-topic thresholding
Wed, Feb 19, 11:06 PM
EBernhardson committed rWDANdcbdcaa2f0a1: glent: ESBulk only needs most recent m2run partition (authored by EBernhardson).
glent: ESBulk only needs most recent m2run partition
Wed, Feb 19, 11:06 PM
EBernhardson committed rWDANff38cad82f08: Change eventgate port to 4592 (authored by EBernhardson).
Change eventgate port to 4592
Wed, Feb 19, 11:06 PM
EBernhardson added a comment to T219534: Test MLR models for zhwiki, jawiki and kowiki.

@dcausse looked into the rescore building, RescoreBuilder::isProfileSyntaxSupported is rejecting the profile containing the LTR because it the query_string search query syntax is reported. Essentially this is saying that the search query needs further parsing on the elasticsearch side, and since our LTR query can't be modified to apply that parsing it instead rejects it outright. Most likely we could add a regex to check how simple the query string is and allow through all queries that are simple search strings and contain no syntax that would be handled by the query_string query.

Wed, Feb 19, 6:37 PM · Discovery-Search (Current work), Patch-For-Review, Chinese-Sites, CirrusSearch
EBernhardson added a comment to T219534: Test MLR models for zhwiki, jawiki and kowiki.

Talked about this today. Short term: Investigate why the classic query building isn't generating an LTR query, it almost certainly should. If that isn't fruitful we should ship the test without jawiki. Longer term: Elasticsearch is deprecating (turning into a noop) the phrase query we use here in 6.8, and removing it in 7.x. We need to re-evaluate the kuromoji analysis chain and hopefully move ja onto a proper analysis chain before upgrading to 6.8. The other spaceless languages are probably not big enough to need specific support, we can perhaps use shingles since the text content is minimal (outside jawiki).

Wed, Feb 19, 5:33 PM · Discovery-Search (Current work), Patch-For-Review, Chinese-Sites, CirrusSearch
EBernhardson committed rWDAN05e03d599a85: skein: Implement support for shipping results back to hdfs (authored by EBernhardson).
skein: Implement support for shipping results back to hdfs
Wed, Feb 19, 12:26 AM

Tue, Feb 18

EBernhardson added a comment to T240556: Load ORES articletopic data into ElasticSearch via the weekly bulk update.

To fake the data, a couple parts:

Tue, Feb 18, 11:48 PM · Discovery-Search (Current work)
EBernhardson added a comment to T240556: Load ORES articletopic data into ElasticSearch via the weekly bulk update.

While this workflow is deployed, it's currently flagged to off in the airflow admin. My main thought there was that we are adding thresholding, and current runs aren't taking that into account. It seemed better to wait until per-wiki/topic thresholding was deployed before turning on the data shipping. This was briefly deployed and run for a week or two, before I realized we needed the updated thresholding.

Tue, Feb 18, 11:03 PM · Discovery-Search (Current work)
EBernhardson committed rWDAN70a9d019e361: Dont directly refer to runtime paths in operators (authored by EBernhardson).
Dont directly refer to runtime paths in operators
Tue, Feb 18, 10:01 PM
EBernhardson committed rWDAN38e5c28dd346: Generalize ores predictions dag (authored by EBernhardson).
Generalize ores predictions dag
Tue, Feb 18, 10:01 PM
EBernhardson committed rWDAN27087e722101: Generalize skein spec test (authored by EBernhardson).
Generalize skein spec test
Tue, Feb 18, 10:01 PM
EBernhardson added a comment to T240550: Add mapping for ORES topic field in ElasticSearch.

This is still waiting for an in-place reindex before it is queryable. We were waiting on wmf.19 and an unrelated mapping change before running that. Now that that change is deployed along with this one we should be able to run the re-index this week.

Tue, Feb 18, 9:48 PM · MW-1.35-notes (1.35.0-wmf.15; 2020-01-14), Discovery-Search (Current work)
EBernhardson updated subscribers of T219534: Test MLR models for zhwiki, jawiki and kowiki.

Reviewing the history, I think the primary concern related to bm25 and spaceless languages was:

Tue, Feb 18, 7:38 PM · Discovery-Search (Current work), Patch-For-Review, Chinese-Sites, CirrusSearch
EBernhardson added a comment to T245202: RESTBase 500 spike of all /page/related/ hits following 1.35.0-wmf.19 all-wiki deployment.

Random data points so I can find them next time we have this issue:

Tue, Feb 18, 6:44 PM · Patch-For-Review, CirrusSearch, Discovery-Search, Android-app-Bugs, iOS-app-Bugs, Wikipedia-iOS-App-Backlog, RESTBase, Wikipedia-Android-App-Backlog, Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3))
EBernhardson closed T149755: Enable making configuration changes on a per-language basis as Resolved.
Tue, Feb 18, 4:57 PM · Patch-For-Review, MediaWiki-Configuration

Fri, Feb 14

EBernhardson added a comment to T245202: RESTBase 500 spike of all /page/related/ hits following 1.35.0-wmf.19 all-wiki deployment.

As an alternate solution, we may actually be able to drop the cache and repopulate it. There is a failover cluster matching the primary cluster in the codfw datacenter, and CirrusSearch has support to redirect classes of queries to particular clusters. We can direct all of these related articles requests to the codfw cluster, and with it not being occupied by the typical search load we may be able to serve the full request load as the cache refills.

Fri, Feb 14, 12:28 AM · Patch-For-Review, CirrusSearch, Discovery-Search, Android-app-Bugs, iOS-app-Bugs, Wikipedia-iOS-App-Backlog, RESTBase, Wikipedia-Android-App-Backlog, Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3))

Thu, Feb 13

EBernhardson added a comment to T245202: RESTBase 500 spike of all /page/related/ hits following 1.35.0-wmf.19 all-wiki deployment.

Essentially what has happened here is that the namespace passed to elasticsearch has changed types from a string to an integer. This changed the request, which invalidates the cache. As for best method forward, not sure. Most direct of course is to change that integer back into a string, but there isn't an obviously great place to do that.

Thu, Feb 13, 11:36 PM · Patch-For-Review, CirrusSearch, Discovery-Search, Android-app-Bugs, iOS-app-Bugs, Wikipedia-iOS-App-Backlog, RESTBase, Wikipedia-Android-App-Backlog, Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3))
EBernhardson added a comment to T245202: RESTBase 500 spike of all /page/related/ hits following 1.35.0-wmf.19 all-wiki deployment.

For the given time period the CirrusSearch more_like cache, which is a second level cache (the http responses should be cached by traffic infra ) hit rate dropped from ~75% to 5%, and climbed back to ~12% over the ten minutes of deployment. The number of successfull requests doubled from ~200 to ~400, but this was not enough to handle the dramatic drop in hit rate.

Thu, Feb 13, 11:18 PM · Patch-For-Review, CirrusSearch, Discovery-Search, Android-app-Bugs, iOS-app-Bugs, Wikipedia-iOS-App-Backlog, RESTBase, Wikipedia-Android-App-Backlog, Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3))
EBernhardson created P10395 (An Untitled Masterwork).
Thu, Feb 13, 2:27 AM

Mon, Feb 10

EBernhardson moved T237363: Undeploy Glent M0 A/B test from To Be Deployed to Done on the Discovery-Search (Current work) board.
Mon, Feb 10, 8:21 PM · MW-1.35-notes (1.35.0-wmf.8; 2019-11-26), Discovery-Search (Current work)
EBernhardson moved T237365: Enable Glent M0 on de, en and fr wikipedias from To Be Deployed to Done on the Discovery-Search (Current work) board.
Mon, Feb 10, 8:21 PM · Discovery-Search (Current work)
EBernhardson moved T237365: Enable Glent M0 on de, en and fr wikipedias from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Mon, Feb 10, 8:21 PM · Discovery-Search (Current work)
EBernhardson moved T237363: Undeploy Glent M0 A/B test from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Mon, Feb 10, 8:21 PM · MW-1.35-notes (1.35.0-wmf.8; 2019-11-26), Discovery-Search (Current work)
EBernhardson moved T238246: Add "source" to A/B test schema for DYM suggestions from In Progress to Needs review on the Discovery-Search (Current work) board.
Mon, Feb 10, 8:20 PM · MW-1.35-notes (1.35.0-wmf.21; 2020-02-25), Discovery-Search (Current work)
EBernhardson moved T240556: Load ORES articletopic data into ElasticSearch via the weekly bulk update from In Progress to Done on the Discovery-Search (Current work) board.
Mon, Feb 10, 8:15 PM · Discovery-Search (Current work)
EBernhardson moved T244310: FreezeWritesToCluster doesn't use API i18n functions from In Progress to Done on the Discovery-Search (Current work) board.
Mon, Feb 10, 7:17 PM · MW-1.35-notes (1.35.0-wmf.19; 2020-02-11), Discovery-Search (Current work), MediaWiki-API, Technical-Debt, CirrusSearch
EBernhardson moved T241953: Search should let you search for the title of a book in any language and give results accross languages. from needs triage to elastic / cirrus on the Discovery-Search board.
Mon, Feb 10, 6:55 PM · Discovery-Search, CirrusSearch, Wikisource
EBernhardson moved T243727: Internal API HTTP requests should hit the app server internal service IP, not the public HTTPS URL from elastic / cirrus to watching / waiting on the Discovery-Search board.
Mon, Feb 10, 6:55 PM · User-WDoran, Wikimedia-Incident, JsonConfig, SpamBlacklist, CirrusSearch, Discovery-Search, GlobalUserPage, Wikimedia-General-or-Unknown, MediaWiki-API, Core Platform Team, Performance Issue
EBernhardson moved T243727: Internal API HTTP requests should hit the app server internal service IP, not the public HTTPS URL from needs triage to elastic / cirrus on the Discovery-Search board.
Mon, Feb 10, 6:55 PM · User-WDoran, Wikimedia-Incident, JsonConfig, SpamBlacklist, CirrusSearch, Discovery-Search, GlobalUserPage, Wikimedia-General-or-Unknown, MediaWiki-API, Core Platform Team, Performance Issue
EBernhardson triaged T243796: Make a brief summary of search parameters (intitle, insource, incategory) as Medium priority.
Mon, Feb 10, 6:48 PM · Documentation, CirrusSearch, Discovery-Search
EBernhardson moved T243796: Make a brief summary of search parameters (intitle, insource, incategory) from needs triage to elastic / cirrus on the Discovery-Search board.
Mon, Feb 10, 6:48 PM · Documentation, CirrusSearch, Discovery-Search
EBernhardson triaged T244603: Parse error from mediawiki-1.34.0/includes/parser/ParserOutput.php crashes CirrusSearch's forceSearchIndex.php as Low priority.
Mon, Feb 10, 6:45 PM · MediaWiki-Parser, CirrusSearch, Discovery-Search
EBernhardson moved T244603: Parse error from mediawiki-1.34.0/includes/parser/ParserOutput.php crashes CirrusSearch's forceSearchIndex.php from needs triage to elastic / cirrus on the Discovery-Search board.
Mon, Feb 10, 6:45 PM · MediaWiki-Parser, CirrusSearch, Discovery-Search
EBernhardson claimed T244073: Cross-wiki search result should display namespace.
Mon, Feb 10, 6:42 PM · Discovery-Search (Current work), Patch-For-Review, CirrusSearch
EBernhardson moved T244073: Cross-wiki search result should display namespace from In Progress to Needs review on the Discovery-Search (Current work) board.
Mon, Feb 10, 6:42 PM · Discovery-Search (Current work), Patch-For-Review, CirrusSearch
EBernhardson moved T244073: Cross-wiki search result should display namespace from needs triage to Current work on the Discovery-Search board.
Mon, Feb 10, 6:41 PM · Discovery-Search (Current work), Patch-For-Review, CirrusSearch
EBernhardson updated the task description for T147505: [Recurring task] CirrusSearch: what is updated during re-indexing.
Mon, Feb 10, 6:40 PM · Discovery-Search (Current work), Discovery
EBernhardson moved T223046: Lack of case sensitivity with hastemplate: from Waiting to Needs review on the Discovery-Search (Current work) board.
Mon, Feb 10, 6:38 PM · MW-1.35-notes (1.35.0-wmf.19; 2020-02-11), Patch-For-Review, MediaWiki-Search, Discovery-Search (Current work)
EBernhardson updated the task description for T244765: Refine failed for event.mediawiki_cirrussearch_request.
Mon, Feb 10, 5:31 PM · Analytics, Analytics-Cluster
EBernhardson created T244765: Refine failed for event.mediawiki_cirrussearch_request.
Mon, Feb 10, 5:31 PM · Analytics, Analytics-Cluster
EBernhardson added a comment to T244600: Interwiki Titles incorrectly return NS_MAIN from getNamespace().

This is not particularly urgent, more so surprising. I'm fixing a bug elsewhere that determined if a bare title string returned from an external source needs to be namespace prefixed, it worked for local titles but not for interwiki since interwiki reports as NS_MAIN. I understand this is an assumption that is likely baked in all over the place and unlikely to be fixed.

Mon, Feb 10, 5:07 PM · MediaWiki-General, Core Platform Team

Fri, Feb 7

EBernhardson created T244600: Interwiki Titles incorrectly return NS_MAIN from getNamespace().
Fri, Feb 7, 8:18 PM · MediaWiki-General, Core Platform Team
EBernhardson created P10327 Query to fetch sample of cirrussearch hastemplate requests from hive logs.
Fri, Feb 7, 1:54 AM
EBernhardson added a comment to T244549: Enable phpdbg on mwdebug* servers.

In terms of actual deployment I think we can simply install the php-phpdbg package (available from our php7.2 deb component) and adjust MWScript.php to allow the 'phpdbg' SAPI in addition to the 'cli' SAPI that it currently allows.

Fri, Feb 7, 1:08 AM · Patch-For-Review, Release-Engineering-Team, serviceops
EBernhardson created T244549: Enable phpdbg on mwdebug* servers.
Fri, Feb 7, 1:06 AM · Patch-For-Review, Release-Engineering-Team, serviceops
EBernhardson added a comment to T244297: Newcomer tasks: set initial thresholds for ORES articletopic.

Another concern i just realized with respect to thresholds, will be updating the models. If a new articletopic model is released and topic A threshold goes from 0.9 to 0.8, we will still have indexed that old scores, with no real way to distinguish which version of the model the prediction came from.

Fri, Feb 7, 12:55 AM · Scoring-platform-team (Current), Discovery-Search (Current work), Growth-Team (Current Sprint)
EBernhardson added a comment to T244297: Newcomer tasks: set initial thresholds for ORES articletopic.

They could also live in the script that loads data from Hadoop to ES (and currently uses a cutoff of 0.5 for discarding low scores). That would reduce ES space usage, but seems like an even more unpleasant location to manage such config, especially since it will be different from each wiki (or will it? thresholds, for sure, but threshold definitions?)

Fri, Feb 7, 12:48 AM · Scoring-platform-team (Current), Discovery-Search (Current work), Growth-Team (Current Sprint)

Thu, Feb 6

EBernhardson triaged T244073: Cross-wiki search result should display namespace as Medium priority.

This looks to be caused by the text highlighting. If we search for User: "greetings from new zealand" the interwiki result is displayed as User:Robin Patterson, but if there is highlighting available for the title that is used and it does not include the namespace prefix. In full text search title highlighting appropriately provides the namespace prefix.

Thu, Feb 6, 8:49 PM · Discovery-Search (Current work), Patch-For-Review, CirrusSearch
EBernhardson moved T244192: Newcomer tasks: ORES ontology mapping and score thresholds from needs triage to Current work on the Discovery-Search board.
Thu, Feb 6, 8:42 PM · Scoring-platform-team (Current), Discovery-Search (Current work), Growth-Team (Current Sprint)
EBernhardson moved T244297: Newcomer tasks: set initial thresholds for ORES articletopic from needs triage to Current work on the Discovery-Search board.
Thu, Feb 6, 8:42 PM · Scoring-platform-team (Current), Discovery-Search (Current work), Growth-Team (Current Sprint)
EBernhardson moved T244310: FreezeWritesToCluster doesn't use API i18n functions from needs triage to Current work on the Discovery-Search board.
Thu, Feb 6, 8:42 PM · MW-1.35-notes (1.35.0-wmf.19; 2020-02-11), Discovery-Search (Current work), MediaWiki-API, Technical-Debt, CirrusSearch
EBernhardson claimed T244310: FreezeWritesToCluster doesn't use API i18n functions.
Thu, Feb 6, 8:42 PM · MW-1.35-notes (1.35.0-wmf.19; 2020-02-11), Discovery-Search (Current work), MediaWiki-API, Technical-Debt, CirrusSearch
EBernhardson added a comment to T244310: FreezeWritesToCluster doesn't use API i18n functions.

In this case i think i18n was avoided because this api module is only included when running the integration testing suite, it should never be available on a real site. As such there would be no point in translating it. Will simply drop the function.

Thu, Feb 6, 8:38 PM · MW-1.35-notes (1.35.0-wmf.19; 2020-02-11), Discovery-Search (Current work), MediaWiki-API, Technical-Debt, CirrusSearch
EBernhardson moved T244421: Newcomer tasks: UX changes for ORES topics from needs triage to watching / waiting on the Discovery-Search board.
Thu, Feb 6, 8:36 PM · Patch-For-Review, Growth Design, Growth-Team (Current Sprint), Scoring-platform-team, Discovery-Search
EBernhardson moved T244487: WikibaseCirrusSearch and WikibaseLexemeCirrusSearch tests should run with WIkibase in CI from needs triage to watching / waiting on the Discovery-Search board.
Thu, Feb 6, 8:36 PM · Lexicographical data, CirrusSearch, Continuous-Integration-Config, Wikidata, Discovery, Discovery-Search
EBernhardson added a comment to T244297: Newcomer tasks: set initial thresholds for ORES articletopic.

How do we think these thresholds should be applied, It sounds like we need to inject them prior to the indexing pipeline?

Thu, Feb 6, 12:04 AM · Scoring-platform-team (Current), Discovery-Search (Current work), Growth-Team (Current Sprint)

Wed, Feb 5

EBernhardson committed rWDAN524be2bb4013: skein_operator: Ensure hook is creatable (authored by EBernhardson).
skein_operator: Ensure hook is creatable
Wed, Feb 5, 10:39 PM
EBernhardson moved T230495: Partition CirrusSearch mediawiki jobs by cluster from To Be Deployed to Done on the Discovery-Search (Current work) board.
Wed, Feb 5, 10:20 PM · MW-1.35-notes (1.35.0-wmf.8; 2019-11-26), Discovery-Search (Current work), Core Platform Team Workboards (Clinic Duty Team), Cloud-Services, Elasticsearch, Discovery
EBernhardson committed rWDAN280106eab40b: mw_rev_score: Properly pass aliased field to insert stmt (authored by EBernhardson).
mw_rev_score: Properly pass aliased field to insert stmt
Wed, Feb 5, 8:48 PM
EBernhardson committed rWDAN3373c9c3fb7b: Rename drafttopics to articletopics (authored by EBernhardson).
Rename drafttopics to articletopics
Wed, Feb 5, 1:53 PM

Tue, Feb 4

EBernhardson added a comment to T242476: Newcomer tasks: when selecting multiple topics, one topic should not dominate over the others.

Cool, that sounds a lot less effort to maintain than per-topic weights. If defining a custom function is easy, maybe we could use the max of the sigmoids instead of the sum? I think that's closer to how topic search is intended to work (if I check that I'm interested in art and physics, I would want articles that are about art or physics, not necessarily both).

Tue, Feb 4, 10:40 PM · MW-1.35-notes (1.35.0-wmf.14; 2020-01-07), Growth-Team, NewcomerTasks 1.1
EBernhardson edited P10309 maximum of sigmoid of two separate morelikethis queries.
Tue, Feb 4, 9:04 PM
EBernhardson created P10309 maximum of sigmoid of two separate morelikethis queries.
Tue, Feb 4, 9:00 PM
EBernhardson created P10308 maximum of two separate morelikethis queries.
Tue, Feb 4, 8:46 PM
EBernhardson created P10307 separate morelike for two pages.
Tue, Feb 4, 8:35 PM
EBernhardson updated the title for P10306 Simple morelike query for two pages from Simple morelike query for single page id to Simple morelike query for two pages.
Tue, Feb 4, 8:29 PM
EBernhardson created P10306 Simple morelike query for two pages.
Tue, Feb 4, 8:17 PM
EBernhardson added a comment to T240556: Load ORES articletopic data into ElasticSearch via the weekly bulk update.

Patch is up to change the analytics side to articletopic as well. Since some ores_drafttopic data has already been shipped we will need to remember to ask the update script to delete those from the source documents when doing the reindex to add ores_articletopic to the schema

Tue, Feb 4, 6:12 PM · Discovery-Search (Current work)

Jan 24 2020

EBernhardson added a comment to T242476: Newcomer tasks: when selecting multiple topics, one topic should not dominate over the others.

There are two main options I can think of for merging essentially two separate morelikethis queries trying to give them equal weight:

Jan 24 2020, 8:52 PM · MW-1.35-notes (1.35.0-wmf.14; 2020-01-07), Growth-Team, NewcomerTasks 1.1

Jan 21 2020

EBernhardson created P10237 (An Untitled Masterwork).
Jan 21 2020, 7:13 PM
EBernhardson committed rWDANae77f9d0bd1c: Import ores_drafttopics (authored by EBernhardson).
Import ores_drafttopics
Jan 21 2020, 8:55 AM

Jan 17 2020

EBernhardson committed rWDAN938d253cbba0: Generalize transfer_to_es to support multiple inputs (authored by EBernhardson).
Generalize transfer_to_es to support multiple inputs
Jan 17 2020, 10:17 AM
EBernhardson committed rWDAN47aab9863968: Port weekly cirrus data updates to airflow (authored by EBernhardson).
Port weekly cirrus data updates to airflow
Jan 17 2020, 9:52 AM

Jan 15 2020

EBernhardson triaged T240556: Load ORES articletopic data into ElasticSearch via the weekly bulk update as Medium priority.
Jan 15 2020, 6:31 PM · Discovery-Search (Current work)
EBernhardson claimed T240556: Load ORES articletopic data into ElasticSearch via the weekly bulk update.
Jan 15 2020, 6:31 PM · Discovery-Search (Current work)
EBernhardson moved T240556: Load ORES articletopic data into ElasticSearch via the weekly bulk update from watching / waiting to Current work on the Discovery-Search board.
Jan 15 2020, 6:31 PM · Discovery-Search (Current work)

Jan 14 2020

EBernhardson committed rWDANb98561d4f9b2: Split SkeinPlugin into hook/operator (authored by EBernhardson).
Split SkeinPlugin into hook/operator
Jan 14 2020, 6:33 PM
EBernhardson added a comment to T240520: Produce dumps of commons thumbnail URLs.

Can we run one process doing commons and another/others doing the rest,

Jan 14 2020, 1:23 AM · Patch-For-Review, Dumps-Generation, Internet-Archive, Datasets-Archiving

Jan 13 2020

EBernhardson added a comment to T242348: Investigate low resource usage on elastic1061-67.

Followup on how the weights got set to 0 to start with:

Jan 13 2020, 9:56 PM · Discovery-Search (Current work)
EBernhardson moved T231840: Make the si_page column in the searchindex table always in numerical order from needs triage to later on... on the Discovery-Search board.
Jan 13 2020, 7:50 PM · Discovery-Search, MediaWiki-Search
EBernhardson moved T240702: mediawiki.job.cirrusSearchElasticaWrite topics need more partitions! from needs triage to Ops / SRE on the Discovery-Search board.
Jan 13 2020, 7:50 PM · Core Platform Team Workboards (Clinic Duty Team), Discovery-Search
EBernhardson moved T241969: Wrong suggestions shown by glent M0 from needs triage to elastic / cirrus on the Discovery-Search board.
Jan 13 2020, 7:50 PM · Discovery-Search, CirrusSearch
EBernhardson triaged T241969: Wrong suggestions shown by glent M0 as Medium priority.
Jan 13 2020, 7:49 PM · Discovery-Search, CirrusSearch
EBernhardson added a comment to T242327: QINU appears instead of math in search results.

Looking at the HTML output for one example page we have:

<span style="display:none" class="sortkey">Durener Straße&#160;040 '"`UNIQ--nowiki-00000009-QINU`"' </span>
Jan 13 2020, 7:48 PM · Discovery-Search, MediaWiki-Parser, Math
EBernhardson triaged T242492: Add weight parameter to morelikethis CirrusSearch feature as Medium priority.
Jan 13 2020, 7:36 PM · Discovery-Search (Current work), CirrusSearch, Growth-Team
EBernhardson moved T242492: Add weight parameter to morelikethis CirrusSearch feature from needs triage to Current work on the Discovery-Search board.
Jan 13 2020, 7:36 PM · Discovery-Search (Current work), CirrusSearch, Growth-Team
EBernhardson committed rWDAN7b40d2aaa862: Define output paths more explicitly (authored by EBernhardson).
Define output paths more explicitly
Jan 13 2020, 4:49 PM