Page MenuHomePhabricator

EBernhardson (EBernhardson)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Oct 7 2014, 4:49 PM (452 w, 3 d)
Availability
Available
LDAP User
EBernhardson
MediaWiki User
EBernhardson (WMF) [ Global Accounts ]

Recent Activity

Mon, Jun 5

EBernhardson moved T333468: Use the mediawiki.revision_score_drafttopic stream instead of mediawiki.revision-score from Blocked/Waiting to Needs review on the Discovery-Search (Current work) board.
Mon, Jun 5, 3:21 PM · Patch-For-Review, Discovery-Search (Current work), CirrusSearch

Fri, May 26

EBernhardson awarded T244840: Evaluate options for non-root operations with cumin and spicerack cookbooks a Love token.
Fri, May 26, 3:11 PM · Cumin, Infrastructure-Foundations, SRE

Wed, May 24

EBernhardson moved T334194: Optimize the elasticsearch analysis settings for wikibase from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Wed, May 24, 6:33 PM · MW-1.41-notes (1.41.0-wmf.11; 2023-05-30), Discovery-Search (Current work), CirrusSearch

Mon, May 22

EBernhardson moved T331300: Ensure WCQS/WDQS stack works on Bullseye from Ready for Dev -- SRE/Ops to In Progress on the Discovery-Search (Current work) board.
Mon, May 22, 3:25 PM · Data-Platform-SRE, Discovery-Search (Current work)
EBernhardson moved T331300: Ensure WCQS/WDQS stack works on Bullseye from In Progress to Ready for Dev -- SRE/Ops on the Discovery-Search (Current work) board.
Mon, May 22, 3:25 PM · Data-Platform-SRE, Discovery-Search (Current work)
EBernhardson moved T327199: on-wiki search is failing to find relatively newer titles on enwiki from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Mon, May 22, 3:15 PM · Discovery-Search (Current work), CirrusSearch

Thu, May 18

EBernhardson moved T334470: Federated queries to Lingua Libre time out in the Commons query service from Needs review to Needs Reporting on the Discovery-Search (Current work) board.

These queries look to be running as expected now.

Thu, May 18, 5:33 PM · Discovery-Search (Current work), Wikidata, Lingua Libre, Commons, Wikidata-Query-Service

Wed, May 17

EBernhardson claimed T334194: Optimize the elasticsearch analysis settings for wikibase.
Wed, May 17, 9:57 PM · MW-1.41-notes (1.41.0-wmf.11; 2023-05-30), Discovery-Search (Current work), CirrusSearch

Tue, May 16

EBernhardson added a comment to T334194: Optimize the elasticsearch analysis settings for wikibase.

To get an idea of what we need to optimize i ran an experiment. This experiment stands up a fresh elasticsearch instance, creates 100 indexes with the same settings, and restarts the instance every 10 indexes. I measure how long the instance takes to come up and how long indices take to create. Ran this experiment with 4 different index configurations:

Tue, May 16, 8:12 PM · MW-1.41-notes (1.41.0-wmf.11; 2023-05-30), Discovery-Search (Current work), CirrusSearch

Mon, May 15

EBernhardson moved T335873: Special:Search broken on Beta Wikidata for entity namespaces from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

reindex complete, looks to have resolved the issue as expected.

Mon, May 15, 8:23 PM · Discovery-Search (Current work), wdwb-tech, Beta-Cluster-Infrastructure, Wikidata
EBernhardson claimed T335873: Special:Search broken on Beta Wikidata for entity namespaces.

Search backend error during entity_full_text search for 'test' after 35: Parse error on Cannot search on field [labels.en] since it is not indexed.

Mon, May 15, 6:31 PM · Discovery-Search (Current work), wdwb-tech, Beta-Cluster-Infrastructure, Wikidata
EBernhardson moved T333468: Use the mediawiki.revision_score_drafttopic stream instead of mediawiki.revision-score from In Progress to Blocked/Waiting on the Discovery-Search (Current work) board.
Mon, May 15, 3:13 PM · Patch-For-Review, Discovery-Search (Current work), CirrusSearch
EBernhardson moved T330936: Missing Cirrussearch dump (enwiki and wikidata) from Blocked/Waiting to Needs Reporting on the Discovery-Search (Current work) board.

has been deployed for a month without issues, lets hope this is resolved.

Mon, May 15, 3:11 PM · MW-1.41-notes (1.41.0-wmf.5; 2023-04-17), Discovery-Search (Current work), CirrusSearch, Dumps-Generation
EBernhardson moved T336519: Investigate puppet failure on cirrus-integ03.search.eqiad1.wikimedia.cloud from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Mon, May 15, 3:08 PM · Discovery-Search (Current work)

Thu, May 11

EBernhardson claimed T334470: Federated queries to Lingua Libre time out in the Commons query service.
Thu, May 11, 7:36 PM · Discovery-Search (Current work), Wikidata, Lingua Libre, Commons, Wikidata-Query-Service
EBernhardson moved T334823: Add https://opendata.aragon.es/sparql to the list of federated endpoints for WDQS and WCQS from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Thu, May 11, 6:53 PM · Wikidata, Discovery-Search (Current work), Wikidata-Query-Service
EBernhardson added a comment to T336443: Investigate performance differences between wdqs2022 and older hosts.

The most notable difference in metrics I see is in the disk utilization per host of the cluster overview dashboard. During the backfilling period all the other codfw hosts are reporting a max per-disk value of around 10%. For 2002 half the disks were at 25% and the other half at 45-50%.

Thu, May 11, 5:00 PM · Discovery-Search (Current work)

May 10 2023

EBernhardson moved T336076: rdf-streaming-updater-producer logs seem to be missing from Incoming to Needs Reporting on the Discovery-Search (Current work) board.

Flink logs are ECS-formatted and can be found under the ecs-* index pattern. Query: orchestrator.namespace:"rdf-streaming-updater" On April 18th, we allowed ECS-formatted logs < 1.7.0 to enter later ECS versioned indexes: T292585.

May 10 2023, 7:12 PM · Observability-Logging, Discovery-Search (Current work)
EBernhardson updated the task description for T336076: rdf-streaming-updater-producer logs seem to be missing.
May 10 2023, 4:57 PM · Observability-Logging, Discovery-Search (Current work)
EBernhardson added a project to T336076: rdf-streaming-updater-producer logs seem to be missing: Observability-Logging.
May 10 2023, 4:55 PM · Observability-Logging, Discovery-Search (Current work)

May 9 2023

EBernhardson claimed T333468: Use the mediawiki.revision_score_drafttopic stream instead of mediawiki.revision-score.
May 9 2023, 8:56 PM · Patch-For-Review, Discovery-Search (Current work), CirrusSearch
EBernhardson claimed T334823: Add https://opendata.aragon.es/sparql to the list of federated endpoints for WDQS and WCQS.
May 9 2023, 5:18 PM · Wikidata, Discovery-Search (Current work), Wikidata-Query-Service

May 8 2023

EBernhardson claimed T335551: Search form inside Special:Search should render autocapitalize attribute that respect $wgCapitalLinks.
May 8 2023, 6:35 PM · MW-1.41-notes (1.41.0-wmf.10; 2023-05-23), Discovery-Search (Current work), MediaWiki-Search, Mobile
EBernhardson added a comment to T334194: Optimize the elasticsearch analysis settings for wikibase.

If this is about Wikibase.cloud, how much should we de-duplicate vs reduce the numbers of supported languages? My intuition is that on Wikibase.cloud, the number of languages per instance is unlikely to be as high as on Wikibase. Would that also be an option? Would that help?

May 8 2023, 5:35 PM · MW-1.41-notes (1.41.0-wmf.11; 2023-05-30), Discovery-Search (Current work), CirrusSearch

May 5 2023

EBernhardson created T336076: rdf-streaming-updater-producer logs seem to be missing.
May 5 2023, 5:14 PM · Observability-Logging, Discovery-Search (Current work)

May 4 2023

EBernhardson added a comment to T334194: Optimize the elasticsearch analysis settings for wikibase.

Is there anywhere where I can briefly read up on what analyzers, token filters and char filters are in this context? Then I can probably help.

May 4 2023, 6:33 PM · MW-1.41-notes (1.41.0-wmf.11; 2023-05-30), Discovery-Search (Current work), CirrusSearch

May 3 2023

EBernhardson added a comment to T295735: Decide on the future of Cirrus development/integration environment.

As part of migrating our integration testing out of vagrant I put together an environment based off of mwcli / mwdd that runs enough of CirrusSearch to pass the integration tests: https://gitlab.wikimedia.org/repos/search-platform/cirrus-integration-test-runner/

May 3 2023, 10:21 PM · CirrusSearch, Discovery-Search

May 2 2023

EBernhardson created P47293 basic rdf query service logback.xml.
May 2 2023, 5:02 PM

Apr 27 2023

EBernhardson moved T332355: Deploy Turkish Analyzer Plugin from In Progress to Needs review on the Discovery-Search (Current work) board.

Released a new version of the plugin to maven central, 7.10.2-wmf8. Once the debian packaging is done and available we can start updating the places that use it.

Apr 27 2023, 10:36 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson claimed T332355: Deploy Turkish Analyzer Plugin.
Apr 27 2023, 5:53 PM · Patch-For-Review, Discovery-Search (Current work)

Apr 26 2023

EBernhardson moved T328276: Add outlink topic model predictions to CirrusSearch indices from Ready for Dev -- SRE/Ops to Ready for Dev -- SWE on the Discovery-Search (Current work) board.
Apr 26 2023, 7:27 PM · Discovery-Search (Current work), Machine-Learning-Team, CirrusSearch
EBernhardson moved T328276: Add outlink topic model predictions to CirrusSearch indices from In Progress to Ready for Dev -- SRE/Ops on the Discovery-Search (Current work) board.
Apr 26 2023, 7:26 PM · Discovery-Search (Current work), Machine-Learning-Team, CirrusSearch
EBernhardson claimed T327199: on-wiki search is failing to find relatively newer titles on enwiki.
Apr 26 2023, 7:26 PM · Discovery-Search (Current work), CirrusSearch

Apr 25 2023

EBernhardson added a comment to T327199: on-wiki search is failing to find relatively newer titles on enwiki.

We can use an elasticsearch query to find the oldest dated completion indices. This query will give us the 5 titlesuggest indices with the oldest batch_id (~= indexing timestamp) when issued against the :9243 cluster:

Apr 25 2023, 7:21 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T327199: on-wiki search is failing to find relatively newer titles on enwiki.

Poking at the logs for the script that builds the daily autocomplete indices, we may be missing errors that happen there. The logs show that the enwiki completion index failed its daily build from dec 9 2022 through jan 20 2023. This was not identified by any of our monitoring, we should correct that so these errors bubble up sooner and get fixed immediately.

Apr 25 2023, 6:28 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T327199: on-wiki search is failing to find relatively newer titles on enwiki.

To check that titles are making it into the primary search index i ran a quick python script (P47281) and ran it for the last 7 days worth of new pages according to the recent changes api. It found 12572 pages that were created and should exist in the enwiki search index. Of these 12 were not found in the search index. A manual check shows them all to be redirects to redirects which we don't index. This looks to be generally working, although there could certainly be edge cases that are not handled.

Apr 25 2023, 6:20 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson created P47281 check for new pages in search index.
Apr 25 2023, 6:13 PM
EBernhardson added a comment to T327199: on-wiki search is failing to find relatively newer titles on enwiki.

I tried to clarify the section of docs about updates to include the distinction between full-text and title completion search indexes: https://www.mediawiki.org/w/index.php?title=Help:CirrusSearch&diff=prev&oldid=5897819

Apr 25 2023, 4:32 PM · Discovery-Search (Current work), CirrusSearch

Apr 24 2023

EBernhardson claimed T328276: Add outlink topic model predictions to CirrusSearch indices.
Apr 24 2023, 8:58 PM · Discovery-Search (Current work), Machine-Learning-Team, CirrusSearch
EBernhardson moved T328276: Add outlink topic model predictions to CirrusSearch indices from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board.
Apr 24 2023, 8:58 PM · Discovery-Search (Current work), Machine-Learning-Team, CirrusSearch
EBernhardson moved T333183: Migrate cindy-the-browser-test-bot to a docker based runner from In Progress to Needs review on the Discovery-Search (Current work) board.
Apr 24 2023, 7:52 PM · MW-1.41-notes (1.41.0-wmf.9; 2023-05-15), Patch-For-Review, Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T333183: Migrate cindy-the-browser-test-bot to a docker based runner.

Took a quick pass at the docs on wikitech as well: https://wikitech.wikimedia.org/wiki/Cindy_The_Browser_Test_Bot

Apr 24 2023, 6:21 PM · MW-1.41-notes (1.41.0-wmf.9; 2023-05-15), Patch-For-Review, Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T333183: Migrate cindy-the-browser-test-bot to a docker based runner.

Updated the dependencies on the cirrus side from node 10 / wdio v5 to node 14 / wdio v7. This brings us a few years into the future. Didn't try the node 16/ wdio v8. One issue we will soon run into is that this test suite was written with synchronous wdio, but thats been discontinued starting in nodejs v16.

Apr 24 2023, 6:07 PM · MW-1.41-notes (1.41.0-wmf.9; 2023-05-15), Patch-For-Review, Discovery-Search (Current work), CirrusSearch
EBernhardson moved T330936: Missing Cirrussearch dump (enwiki and wikidata) from Needs review to Blocked/Waiting on the Discovery-Search (Current work) board.
Apr 24 2023, 5:58 PM · MW-1.41-notes (1.41.0-wmf.5; 2023-04-17), Discovery-Search (Current work), CirrusSearch, Dumps-Generation
EBernhardson moved T328332: Add a new keyword to filter pages based on their "length" from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Apr 24 2023, 5:56 PM · MW-1.41-notes (1.41.0-wmf.9; 2023-05-15), MW-1.40-notes (1.40.0-wmf.24; 2023-02-20), Discovery-Search (Current work), CirrusSearch, GrowthExperiments-Homepage, Growth-Team
EBernhardson set the point value for T328276: Add outlink topic model predictions to CirrusSearch indices to 5.
Apr 24 2023, 3:50 PM · Discovery-Search (Current work), Machine-Learning-Team, CirrusSearch
EBernhardson moved T325672: Re-order and optimize change events from Incoming to Ready for Dev -- SRE/Ops on the Discovery-Search (Current work) board.
Apr 24 2023, 3:42 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson moved T218315: cleanup the custom elasticsearch_${version}@ systemd unit in favor of an override configuration from Incoming to Ready for Dev -- SRE/Ops on the Discovery-Search (Current work) board.
Apr 24 2023, 3:40 PM · Discovery-Search, Elasticsearch
EBernhardson moved T325565: Add support for page re-renders from Incoming to Ready for Dev -- SWE on the Discovery-Search (Current work) board.
Apr 24 2023, 3:38 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson set the point value for T332314: Configure new WDQS servers in codfw (wdqs20[13-22]) to 5.
Apr 24 2023, 3:37 PM · Wikidata, Wikidata-Query-Service, Discovery-Search (Current work)
EBernhardson set the point value for T331297: Audit/update NIC firmware on Search Platform-owned Buster hosts to 3.
Apr 24 2023, 3:36 PM · Discovery-Search (Current work)
EBernhardson moved T335066: Understand "this node stopped indexing” alerts (Elasticsearch) from needs triage to Ops / SRE on the Discovery-Search board.
Apr 24 2023, 3:30 PM · Discovery-Search
EBernhardson moved T334681: Wikimedia\Assert\PostconditionException: Postcondition failed: Regex failed: 4 from needs triage to Current work on the Discovery-Search board.
Apr 24 2023, 3:23 PM · Discovery-Search (Current work), CirrusSearch, Wikimedia-production-error
EBernhardson added a comment to T334194: Optimize the elasticsearch analysis settings for wikibase.

We should measure the gain. We already have a component that can deduplicate (AnalysisFilter) but we should test if this has any useful effect. Based on the change in index creation time we can decided if it should move forward.

Apr 24 2023, 3:19 PM · MW-1.41-notes (1.41.0-wmf.11; 2023-05-30), Discovery-Search (Current work), CirrusSearch
EBernhardson moved T332953: Migrate PipelineLib repos to GitLab from Incoming to Blocked/Waiting on the Discovery-Search (Current work) board.
Apr 24 2023, 3:15 PM · Data-Platform-SRE, Discovery-Search (Current work), API Platform, Shared-Data-Infrastructure, Patch-For-Review, Data Pipelines, wdwb-tech, Wikidata, Security-Team, SRE, Wikidata-Campsite, Anti-Harassment, Wikispeech, Structured-Data-Backlog, Platform Engineering, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, Editing-team, Content-Transform-Team, Metrics-Platform-Planning, Machine-Learning-Team, Cloud-Services, GitLab (Project Migration), Release-Engineering-Team (Priority Backlog 📥)
EBernhardson moved T332953: Migrate PipelineLib repos to GitLab from needs triage to Current work on the Discovery-Search board.
Apr 24 2023, 3:14 PM · Data-Platform-SRE, Discovery-Search (Current work), API Platform, Shared-Data-Infrastructure, Patch-For-Review, Data Pipelines, wdwb-tech, Wikidata, Security-Team, SRE, Wikidata-Campsite, Anti-Harassment, Wikispeech, Structured-Data-Backlog, Platform Engineering, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, Editing-team, Content-Transform-Team, Metrics-Platform-Planning, Machine-Learning-Team, Cloud-Services, GitLab (Project Migration), Release-Engineering-Team (Priority Backlog 📥)

Apr 20 2023

EBernhardson added a comment to T333183: Migrate cindy-the-browser-test-bot to a docker based runner.

Progress so far:

Apr 20 2023, 4:18 PM · MW-1.41-notes (1.41.0-wmf.9; 2023-05-15), Patch-For-Review, Discovery-Search (Current work), CirrusSearch

Apr 10 2023

EBernhardson claimed T333183: Migrate cindy-the-browser-test-bot to a docker based runner.
Apr 10 2023, 7:33 PM · MW-1.41-notes (1.41.0-wmf.9; 2023-05-15), Patch-For-Review, Discovery-Search (Current work), CirrusSearch
EBernhardson moved T333183: Migrate cindy-the-browser-test-bot to a docker based runner from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board.
Apr 10 2023, 7:32 PM · MW-1.41-notes (1.41.0-wmf.9; 2023-05-15), Patch-For-Review, Discovery-Search (Current work), CirrusSearch
EBernhardson moved T323628: Optimize the WikibaseCirrusSearch elasticsearch mapping and filter query for non-english users from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Apr 10 2023, 7:32 PM · MW-1.41-notes (1.41.0-wmf.1; 2023-03-20), MW-1.40-notes (1.40.0-wmf.27; 2023-03-13), Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T328332: Add a new keyword to filter pages based on their "length".

re-index is currently running, mappings look as expected for indices that have completed reindexing already. It's up to r, will probably take a few more days and we can ship the keyword in next weeks train with any luck.

Apr 10 2023, 6:30 PM · MW-1.41-notes (1.41.0-wmf.9; 2023-05-15), MW-1.40-notes (1.40.0-wmf.24; 2023-02-20), Discovery-Search (Current work), CirrusSearch, GrowthExperiments-Homepage, Growth-Team
EBernhardson added a comment to T323628: Optimize the WikibaseCirrusSearch elasticsearch mapping and filter query for non-english users.

@TJones can you take a look agian now? I see a number of results now for a korean search for 가마우지 but I'm not familiar with what it was returning before and can't be certain this is correct.

Apr 10 2023, 6:14 PM · MW-1.41-notes (1.41.0-wmf.1; 2023-03-20), MW-1.40-notes (1.40.0-wmf.27; 2023-03-13), Discovery-Search (Current work), CirrusSearch

Mar 31 2023

EBernhardson created T333697: decom an-airflow1001.
Mar 31 2023, 3:11 PM · Discovery-Search (Current work)

Mar 30 2023

EBernhardson moved T330936: Missing Cirrussearch dump (enwiki and wikidata) from In Progress to Needs review on the Discovery-Search (Current work) board.
Mar 30 2023, 6:52 PM · MW-1.41-notes (1.41.0-wmf.5; 2023-04-17), Discovery-Search (Current work), CirrusSearch, Dumps-Generation
EBernhardson claimed T330936: Missing Cirrussearch dump (enwiki and wikidata).
Mar 30 2023, 5:12 PM · MW-1.41-notes (1.41.0-wmf.5; 2023-04-17), Discovery-Search (Current work), CirrusSearch, Dumps-Generation
EBernhardson removed a project from T328497: Remove unnecessary targets definitions: Discovery-Search (Current work).
Mar 30 2023, 5:11 PM · Patch-For-Review, MW-1.41-notes (1.41.0-wmf.3; 2023-04-03), MW-1.40-notes (1.40.0-wmf.27; 2023-03-13), Wikidata, Performance-Team (Radar), Structured-Data-Backlog, Machine-Learning-Team, Data-Engineering, Timeless, Wikistories, All-and-every-Wikisource, MediaWiki-extensions-WikimediaEvents, MediaWiki-extensions-WikimediaBadges, WikiHiero, Wikidata.org, Wikibase-Quality-Constraints, WikibaseMediaInfo, VueTest, MediaWiki-extensions-UrlShortener, UploadWizard, MediaWiki-extensions-Translate, TitleBlacklist, EasyTimeline, TimedMediaHandler, SpamBlacklist, SDAW-SearchVue, MediaWiki-extensions-Score, MediaWiki-extensions-Quiz, MediaWiki-extensions-Phonos, ORES, MediaWiki-extensions-OAuth, NavigationTiming, SDAW-MediaSearch, MachineVision, MediaWiki-extensions-InputBox, MediaWiki-extensions-FlaggedRevs, MediaWiki-extensions-EventLogging, DismissableSiteNotice, DiscussionTools, MediaWiki-extensions-Disambiguator, patch-welcome, Citoid, Cite, CirrusSearch, CharInsert, MediaWiki-extensions-CentralNotice, MediaWiki-extensions-CentralAuth, MediaWiki-extensions-CategoryTree, BetaFeatures, ArticlePlaceholder, Advanced-Search, ConfirmEdit (CAPTCHA extension), Social-Tools, CampaignEvents, ChessBrowser, CodeEditor, ExternalGuidance, MediaWiki-extensions-GlobalWatchlist, QuizGame, MediaWiki-extensions-Screenplay, SyntaxHighlight, Two-Column-Edit-Conflict-Merge, UniversalLanguageSelector, WikiEditor, BlueSky, Metrolook, User-Jdlrobson, Technical-Debt (RW-Tech-Debt), Front-end-Standards-Group
EBernhardson moved T331580: Fix permissions in hdfs://analytics-hadoop/wmf/data/discovery from In Progress to Needs Reporting on the Discovery-Search (Current work) board.
Mar 30 2023, 5:09 PM · Data-Engineering, Discovery-Search (Current work), CirrusSearch
EBernhardson moved T330447: Migrate mediawiki_revision_recommendation_create.py from airflow 1 to airflow 2 from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Mar 30 2023, 5:09 PM · Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning
EBernhardson moved T329880: Migrate search_satisfaction.py from airflow 1 to airflow 2 from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Mar 30 2023, 5:09 PM · Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning
EBernhardson moved T329872: Migrate glent_weekly.py from airflow 1 to airflow 2 from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mar 30 2023, 5:09 PM · Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning
EBernhardson moved T329881: Migrate transfer_to_es.py from airflow 1 to airflow 2 from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mar 30 2023, 5:09 PM · Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning

Mar 27 2023

EBernhardson created T333183: Migrate cindy-the-browser-test-bot to a docker based runner.
Mar 27 2023, 3:38 PM · MW-1.41-notes (1.41.0-wmf.9; 2023-05-15), Patch-For-Review, Discovery-Search (Current work), CirrusSearch

Mar 22 2023

EBernhardson moved T329876: Migrate ores_predictions.py from airflow 1 to airflow 2 from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mar 22 2023, 8:27 PM · Patch-For-Review, Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning

Mar 20 2023

EBernhardson moved T330446: Migrate fulltext_head_queries.py from airflow 1 to airflow 2 from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mar 20 2023, 10:45 PM · Discovery-Search (Current work)
EBernhardson moved T329239: migrate mjolnir application and dag to airflow v2 and spark3 from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mar 20 2023, 10:35 PM · Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning
EBernhardson moved T329871: Migrate export_queries_to_relforge.py from airflow 1 to airflow 2 from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mar 20 2023, 10:35 PM · Patch-For-Review, Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning
EBernhardson moved T330448: Migrate process_sparql_query.py from airflow 1 to airflow 2 from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Mar 20 2023, 10:30 PM · Patch-For-Review, Discovery-Search (Current work)
EBernhardson moved T329874: Migrate import_ttl.py from airflow 1 to airflow 2 from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Mar 20 2023, 10:30 PM · Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning
EBernhardson moved T329878: Migrate query_clicks.py from airflow 1 to airflow 2 from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Mar 20 2023, 10:30 PM · Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning
EBernhardson moved T329877: Migrate popularity_score.py from airflow 1 to airflow 2 from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Mar 20 2023, 10:30 PM · Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning
EBernhardson moved T329872: Migrate glent_weekly.py from airflow 1 to airflow 2 from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Mar 20 2023, 10:29 PM · Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning
EBernhardson moved T323616: Cleanup the /wmf/data/discovery/transfer_to_es folder in hdfs from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mar 20 2023, 5:51 PM · Discovery-Search (Current work), CirrusSearch
EBernhardson moved T330451: Migrate subgraph_and_query_mapping.py from airflow 1 to airflow 2 from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mar 20 2023, 5:43 PM · Discovery-Search (Current work)
EBernhardson moved T330452: Migrate subgraph_and_query_metrics.py from airflow 1 to airflow 2 from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mar 20 2023, 5:43 PM · Discovery-Search (Current work)
EBernhardson moved T329875: Migrate incoming_links.py from airflow 1 to airflow 2 from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mar 20 2023, 5:42 PM · Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning
EBernhardson moved T329873: Migrate import_cirrus_indexes.py from airflow 1 to airflow 2 from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mar 20 2023, 5:42 PM · Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning
EBernhardson moved T329870: Migrate drop_old_data_daily.py from airflow 1 to airflow 2 from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mar 20 2023, 5:42 PM · Patch-For-Review, Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning
EBernhardson added a comment to T330936: Missing Cirrussearch dump (enwiki and wikidata).

We talked a bit about this, the plan right now is to prevent the library we use from disabling the last available connection. That should allow the retries to work as we expect regardless of the error type. It seems this connection disabling is more suited for cases that have many instances in their pool, rather than a single DNS backed by LVS.

Mar 20 2023, 4:35 PM · MW-1.41-notes (1.41.0-wmf.5; 2023-04-17), Discovery-Search (Current work), CirrusSearch, Dumps-Generation
EBernhardson moved T331580: Fix permissions in hdfs://analytics-hadoop/wmf/data/discovery from Incoming to In Progress on the Discovery-Search (Current work) board.
Mar 20 2023, 4:34 PM · Data-Engineering, Discovery-Search (Current work), CirrusSearch
EBernhardson added a comment to T329877: Migrate popularity_score.py from airflow 1 to airflow 2.

needs https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/304 to properly pass templated values to the submitted skein spec and let this run

Mar 20 2023, 4:13 PM · Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning
EBernhardson moved T329877: Migrate popularity_score.py from airflow 1 to airflow 2 from In Progress to Needs review on the Discovery-Search (Current work) board.
Mar 20 2023, 4:13 PM · Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning

Mar 17 2023

EBernhardson updated subscribers of T332455: [beta cluster] Search - "An error has occurred while searching".

@bking has delt with these issues before, might have ideas

Mar 17 2023, 10:29 PM · Discovery-Search (Current work), Beta-Cluster-Infrastructure

Mar 13 2023

EBernhardson added a comment to T331580: Fix permissions in hdfs://analytics-hadoop/wmf/data/discovery.

Command to change right is runnning

To prevent that we can remove write for the analytics-search-users group:

From:
drwxrwxr-x   - analytics-search analytics-search-users               0 2023-03-10 01:06 hdfs://analytics-hadoop/wmf/data/discovery

To:
drwxr-xr-x   - analytics-search analytics-search-users               0 2023-03-10 01:06 hdfs://analytics-hadoop/wmf/data/discovery

I would just need some confirmation from someone having more history knowledge on if it is ot not expected to have users belonging to analytics-search-users group writing on that folder

Mar 13 2023, 6:06 PM · Data-Engineering, Discovery-Search (Current work), CirrusSearch
EBernhardson moved T327970: Create airflow v2 instance and supporting repos for search platform from Needs review to Needs Reporting on the Discovery-Search (Current work) board.

confirmed the instance seems to be working, remaining updates are to be made in the data-engineering airflow-dags repo

Mar 13 2023, 4:07 PM · Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning

Mar 9 2023

EBernhardson created T331580: Fix permissions in hdfs://analytics-hadoop/wmf/data/discovery.
Mar 9 2023, 12:16 AM · Data-Engineering, Discovery-Search (Current work), CirrusSearch

Mar 7 2023

EBernhardson created P45275 (An Untitled Masterwork).
Mar 7 2023, 8:13 PM
EBernhardson added a comment to T327970: Create airflow v2 instance and supporting repos for search platform.

Re https://gerrit.wikimedia.org/r/894740, we should ask @mforns @Milimetric @JAllemandou about this. I think there might be a better way? Maybe the logic to get MW db hostnames and ports should be moved out of refinery python? Or, wmfdata-python uses the refinery bin/analytics-mysql CLI. Perhaps the whole thing should move out of refinery so it is installable without deploying refinery?

Mar 7 2023, 3:38 PM · Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning

Mar 6 2023

EBernhardson claimed T329877: Migrate popularity_score.py from airflow 1 to airflow 2.
Mar 6 2023, 11:17 PM · Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning
EBernhardson moved T329878: Migrate query_clicks.py from airflow 1 to airflow 2 from In Progress to Needs review on the Discovery-Search (Current work) board.
Mar 6 2023, 11:15 PM · Discovery-Search (Current work), Data Pipelines, Data-Engineering-Planning
EBernhardson added a comment to T330936: Missing Cirrussearch dump (enwiki and wikidata).

Took a closer look for the wikidata failure, but i've turned up nothing. With the output ending without any failure messages it suggests to me that the process died, perhaps a force kill or a segfault. I couldn't find anything in the system logs that correlate. Sadly syslog doesn't go back that far (oldest syslog entry is feb 27, this died on feb 22) but theres no certainty it would have had useful information.

Mar 6 2023, 6:13 PM · MW-1.41-notes (1.41.0-wmf.5; 2023-04-17), Discovery-Search (Current work), CirrusSearch, Dumps-Generation