Page MenuHomePhabricator

dcausse (David Causse)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Jun 9 2015, 9:03 AM (505 w, 2 d)
Availability
Available
IRC Nick
dcausse
LDAP User
DCausse
MediaWiki User
DCausse (WMF) [ Global Accounts ]

Recent Activity

Yesterday

dcausse closed T385005: Build all search platform plugins for opensearch 1.3.20, a subtask of T379312: Release packages for opensearch 1.3.20, as Resolved.
Thu, Feb 13, 6:35 PM · Discovery-Search (2025.02.10 - 2025.02.28), Data-Platform-SRE (2025.02.10 - 2025.02.28)
dcausse closed T385005: Build all search platform plugins for opensearch 1.3.20 as Resolved.
Thu, Feb 13, 6:35 PM · Discovery-Search (2025.02.10 - 2025.02.28), CirrusSearch

Wed, Feb 12

dcausse moved T386005: CategoryChangesAsRdf::handleDeletes fails with Error 1176: Key 'rc_new_name_timestamp' doesn't exist in table 'recentchanges' from Incoming to Done on the Discovery-Search (2025.02.10 - 2025.02.28) board.
Wed, Feb 12, 8:31 AM · Discovery-Search (2025.02.10 - 2025.02.28), Data-Platform-SRE (2025.02.10 - 2025.02.28), MediaWiki-Categories
dcausse closed T386005: CategoryChangesAsRdf::handleDeletes fails with Error 1176: Key 'rc_new_name_timestamp' doesn't exist in table 'recentchanges' as Resolved.

@Ladsgroup thanks for the quick fix! The scripts appear to have run properly this time, all alerts related to categories lag resolved today.

Wed, Feb 12, 8:30 AM · Discovery-Search (2025.02.10 - 2025.02.28), Data-Platform-SRE (2025.02.10 - 2025.02.28), MediaWiki-Categories

Tue, Feb 11

dcausse added a comment to T385970: Update the article-country isvc to use Wikilinks for predictions.

@Isaac awesome, thanks! I'll get something ready this week and report back here.

Tue, Feb 11, 9:00 PM · Patch-For-Review, OKR-Work, Lift-Wing, Machine-Learning-Team
dcausse awarded T386026: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account a Love token.
Tue, Feb 11, 2:39 PM · User-bd808, wikitech.wikimedia.org
dcausse closed T359033: EPIC: Convert CirrusSearch metrics to statslib, a subtask of T343020: Converting MediaWiki Metrics to StatsLib, as Resolved.
Tue, Feb 11, 2:37 PM · SRE Observability (FY2024/2025-Q3), Patch-For-Review, Observability-Metrics
dcausse closed T359033: EPIC: Convert CirrusSearch metrics to statslib as Resolved.
Tue, Feb 11, 2:37 PM · Discovery-Search (2025.02.10 - 2025.02.28), Data-Platform-SRE (2025.02.10 - 2025.02.28), MW-1.43-notes (1.43.0-wmf.13; 2024-07-09), Observability-Metrics, Epic, CirrusSearch
dcausse closed T374702: Cleanup: Remove deprecated weighted tag methods, a subtask of T366253: Create a generic stream to populate CirrusSearch weighted_tags, as Resolved.
Tue, Feb 11, 2:36 PM · Discovery-Search (Current work), CirrusSearch
dcausse closed T374702: Cleanup: Remove deprecated weighted tag methods as Resolved.
Tue, Feb 11, 2:35 PM · Discovery-Search (2025.02.10 - 2025.02.28), MW-1.44-notes (1.44.0-wmf.15; 2025-02-04), Technical-Debt, CirrusSearch
dcausse added a comment to T386026: Decide what to do with SUL attached Wikitech accounts that Bitu associates with a different SUL account.

Please detach 'DCausse' from SUL, rename it to 'DCausse (WMF)', and reattach to SUL.

Tue, Feb 11, 2:09 PM · User-bd808, wikitech.wikimedia.org
dcausse created T386098: Run a full data-reload on wdqs-main, wdqs-scholarly and wdqs to capture new blank node labels.
Tue, Feb 11, 2:01 PM · Data-Platform-SRE (2025.02.10 - 2025.02.28), Wikidata, Wikidata-Query-Service
dcausse created T386097: Re-enable drop_old_data_daily in airflow-search.
Tue, Feb 11, 1:49 PM · Discovery-Search
dcausse added a comment to T379312: Release packages for opensearch 1.3.20.

@brouberol sorry I forgot to link T385005 from here, plugins should be ready to be packaged in https://gerrit.wikimedia.org/r/c/operations/software/opensearch/plugins/+/1118553
stconvert has its own 1.3.x branch at https://gitlab.wikimedia.org/repos/search-platform/opensearch-analysis-stconvert/-/tree/1.3.x?ref_type=heads

Tue, Feb 11, 1:36 PM · Discovery-Search (2025.02.10 - 2025.02.28), Data-Platform-SRE (2025.02.10 - 2025.02.28)
dcausse added a parent task for T385005: Build all search platform plugins for opensearch 1.3.20: T379312: Release packages for opensearch 1.3.20.
Tue, Feb 11, 1:35 PM · Discovery-Search (2025.02.10 - 2025.02.28), CirrusSearch
dcausse added a subtask for T379312: Release packages for opensearch 1.3.20: T385005: Build all search platform plugins for opensearch 1.3.20.
Tue, Feb 11, 1:35 PM · Discovery-Search (2025.02.10 - 2025.02.28), Data-Platform-SRE (2025.02.10 - 2025.02.28)
dcausse added a comment to T386066: Implement `doSearchTitle` for CirrusSearch to allow searching just by title.

Yes, I did consider that, @dcausse, but I believe fully supporting this functionality in CirrusSearch offers a more intuitive and maintainable solution than simply prepending intitle: to every word. For example, to search for "cirrus search test" in the Nuke title field, we'd have to write the Cirrus query as intitle:cirrus intitle:search intitle:test, which feels cumbersome and inelegant. By implementing a dedicated doSearchTitle method (as I’ve done in the patch with separate code paths), we can provide a future-proof solution that allows us to fine-tune key aspects—such as weighting and scoring—specifically for title searches versus full-text queries.

Tue, Feb 11, 1:02 PM · Discovery-Search (2025.02.10 - 2025.02.28), Patch-For-Review, CirrusSearch
dcausse created T386068: Implement articlecountry a new CirrusSearch keyword.
Tue, Feb 11, 10:12 AM · Discovery-Search (2025.02.10 - 2025.02.28), Research
dcausse added a comment to T386066: Implement `doSearchTitle` for CirrusSearch to allow searching just by title.

@MolecularPilot have you considered using the intitle: search keyword for achieving this use-case? (c.f. https://www.mediawiki.org/wiki/Help:CirrusSearch#Intitle_and_incategory)

Tue, Feb 11, 9:37 AM · Discovery-Search (2025.02.10 - 2025.02.28), Patch-For-Review, CirrusSearch
dcausse added a comment to T385970: Update the article-country isvc to use Wikilinks for predictions.

@Isaac we have so utilities to push weighted tags into the search index from a spark job, 43M is a lot and we should be careful not to slow down real-time indexing while doing so. I'll prepare something and update the weighted tags documentation so that we can easily re-use it for future work.
Unfortunately our system requires a bit more info than what you have in your CSV:

  • namespace_id
  • page_title
  • page_id
Tue, Feb 11, 9:32 AM · Patch-For-Review, OKR-Work, Lift-Wing, Machine-Learning-Team

Mon, Feb 10

dcausse added a comment to T386005: CategoryChangesAsRdf::handleDeletes fails with Error 1176: Key 'rc_new_name_timestamp' doesn't exist in table 'recentchanges'.

Related to T270033 and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/667358 specifically I think

Mon, Feb 10, 5:59 PM · Discovery-Search (2025.02.10 - 2025.02.28), Data-Platform-SRE (2025.02.10 - 2025.02.28), MediaWiki-Categories
dcausse moved T385005: Build all search platform plugins for opensearch 1.3.20 from In Progress to Needs review on the Discovery-Search (Current work) board.
Mon, Feb 10, 5:27 PM · Discovery-Search (2025.02.10 - 2025.02.28), CirrusSearch
dcausse added a project to T386005: CategoryChangesAsRdf::handleDeletes fails with Error 1176: Key 'rc_new_name_timestamp' doesn't exist in table 'recentchanges': Data-Platform-SRE.
Mon, Feb 10, 4:49 PM · Discovery-Search (2025.02.10 - 2025.02.28), Data-Platform-SRE (2025.02.10 - 2025.02.28), MediaWiki-Categories
dcausse updated the task description for T386005: CategoryChangesAsRdf::handleDeletes fails with Error 1176: Key 'rc_new_name_timestamp' doesn't exist in table 'recentchanges'.
Mon, Feb 10, 4:48 PM · Discovery-Search (2025.02.10 - 2025.02.28), Data-Platform-SRE (2025.02.10 - 2025.02.28), MediaWiki-Categories
dcausse moved T374702: Cleanup: Remove deprecated weighted tag methods from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mon, Feb 10, 4:30 PM · Discovery-Search (2025.02.10 - 2025.02.28), MW-1.44-notes (1.44.0-wmf.15; 2025-02-04), Technical-Debt, CirrusSearch
dcausse created T386005: CategoryChangesAsRdf::handleDeletes fails with Error 1176: Key 'rc_new_name_timestamp' doesn't exist in table 'recentchanges'.
Mon, Feb 10, 3:43 PM · Discovery-Search (2025.02.10 - 2025.02.28), Data-Platform-SRE (2025.02.10 - 2025.02.28), MediaWiki-Categories
dcausse claimed T385005: Build all search platform plugins for opensearch 1.3.20.
Mon, Feb 10, 7:28 AM · Discovery-Search (2025.02.10 - 2025.02.28), CirrusSearch
dcausse moved T385005: Build all search platform plugins for opensearch 1.3.20 from needs triage to Current work on the Discovery-Search board.
Mon, Feb 10, 7:28 AM · Discovery-Search (2025.02.10 - 2025.02.28), CirrusSearch

Thu, Feb 6

dcausse awarded T382295: Create event stream for article-country model-server hosted on LiftWing a Love token.
Thu, Feb 6, 9:13 AM · OKR-Work, Lift-Wing, Machine-Learning-Team

Tue, Feb 4

dcausse added a comment to T382295: Create event stream for article-country model-server hosted on LiftWing.

@Ottomata and @dcausse, please let us know whether we should proceed to production with these events.

For production, we will use mediawiki.article_country_prediction_change.v1 instead of mediawiki.page_prediction_change.rc0, while mediawiki.cirrussearch.page_weighted_tags_change.rc0 will remain unchanged.

Tue, Feb 4, 8:11 AM · OKR-Work, Lift-Wing, Machine-Learning-Team

Wed, Jan 29

dcausse added a subtask for T341553: Allow running one-off scripts manually: T382398: Mediawiki maint scripts using service proxied by the tls proxy might fail when running with mwscript-k8s.
Wed, Jan 29, 11:20 AM · MW-on-K8s, serviceops
dcausse added a parent task for T382398: Mediawiki maint scripts using service proxied by the tls proxy might fail when running with mwscript-k8s: T341553: Allow running one-off scripts manually.
Wed, Jan 29, 11:20 AM · serviceops
dcausse moved T374702: Cleanup: Remove deprecated weighted tag methods from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Wed, Jan 29, 9:55 AM · Discovery-Search (2025.02.10 - 2025.02.28), MW-1.44-notes (1.44.0-wmf.15; 2025-02-04), Technical-Debt, CirrusSearch
dcausse renamed T379312: Release packages for opensearch 1.3.20 from Release packages for latest opensearch 1 release to Release packages for opensearch 1.3.20.
Wed, Jan 29, 9:45 AM · Discovery-Search (2025.02.10 - 2025.02.28), Data-Platform-SRE (2025.02.10 - 2025.02.28)
dcausse renamed T385005: Build all search platform plugins for opensearch 1.3.20 from Build all search platform plugin for opensearch 1.3.20 to Build all search platform plugins for opensearch 1.3.20.
Wed, Jan 29, 9:44 AM · Discovery-Search (2025.02.10 - 2025.02.28), CirrusSearch
dcausse created T385005: Build all search platform plugins for opensearch 1.3.20.
Wed, Jan 29, 9:44 AM · Discovery-Search (2025.02.10 - 2025.02.28), CirrusSearch

Tue, Jan 28

dcausse added a comment to T375821: Migrate streaming updater event schema to the standard schema repository.

I'm not sure we can safely change the schema_title of an existing stream so we might have to create separate streams and adapt the pipeline to consume from multiple update streams.

Tue, Jan 28, 2:05 PM · Discovery-Search (2025.02.10 - 2025.02.28), Patch-For-Review
dcausse moved T374702: Cleanup: Remove deprecated weighted tag methods from In Progress to Needs review on the Discovery-Search (Current work) board.
Tue, Jan 28, 10:34 AM · Discovery-Search (2025.02.10 - 2025.02.28), MW-1.44-notes (1.44.0-wmf.15; 2025-02-04), Technical-Debt, CirrusSearch
dcausse added a comment to T379312: Release packages for opensearch 1.3.20.

We need to verify the current plugins package will work with 1.3.20 and update it if not.

Tue, Jan 28, 8:42 AM · Discovery-Search (2025.02.10 - 2025.02.28), Data-Platform-SRE (2025.02.10 - 2025.02.28)

Mon, Jan 27

dcausse closed T384437: Few SLIS left in the search index, a subtask of T340437: [EPIC] Image suggestions data pipelines maintenance , as Resolved.
Mon, Jan 27, 2:44 PM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions
dcausse closed T384437: Few SLIS left in the search index as Resolved.

A dataset to fix missing suggestions has been imported and I confirm seeing 89k of them on eswiki.

Mon, Jan 27, 2:44 PM · Section-Level-Image-Suggestions, Structured-Data-Backlog (Current Work)
dcausse closed T384587: Failed Wikidata upstream dependency disrupted image suggestions, a subtask of T340437: [EPIC] Image suggestions data pipelines maintenance , as Resolved.
Mon, Jan 27, 2:43 PM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions
dcausse closed T384587: Failed Wikidata upstream dependency disrupted image suggestions as Resolved.

The dataset has been imported and I confirm seeing 89k suggestion on eswiki.

Mon, Jan 27, 2:43 PM · Image-Suggestions, Structured-Data-Backlog (Current Work)
dcausse added a comment to T382295: Create event stream for article-country model-server hosted on LiftWing.

Thanks all for working this out! I know a lot of moving parts here so I appreciate the work to figure out the best approach and who owns what piece. Just to make sure I understand (for this project and future streams):

  • This mediawiki.cirrussearch.page_weighted_tags_change.rc0 stream is now the interface point between what LiftWing outputs and what goes into Search. Beyond matching that standard schema around page ID/title, that also means we need to define the tag prefix now. There's already a fair bit of code written on CirrusSearch for handling articletopic-related inputs so presumably we want to build on that because article-country is closely related. The existing prefixes are classification.ores.articletopic (this model) and classification.ores.drafttopic (this model). I would suggest not using either of those because we may want the ability to e.g., flush out one set of predictions due to a model update/deprecation without affecting the others. It looks like Kevin had been going with liftwing.test-article-country-events on staging and if I merge that with the existing Search tag norms, it sounds like classification.liftwing.articlecountry maybe is a good choice?
Mon, Jan 27, 10:08 AM · OKR-Work, Lift-Wing, Machine-Learning-Team
dcausse added a comment to T382295: Create event stream for article-country model-server hosted on LiftWing.
  • According to the Stream and the schema definition both the page_id and the page_title are required. This will require a modification to the model server as it currently processes the page_title while making a request. One option would be the ability to make a request either using a page_title or a page_id and use the latter in this use case. Alternatively we'd have to get the page_id within the model server by querying the mediawiki api.

page_title and page_id should already be part of the mediawiki.page_change.v1 events, IIRC the outlink model seems to have access to the whole event could this be using the same technique here to avoid fetching something additional via the mw-api?

Mon, Jan 27, 9:37 AM · OKR-Work, Lift-Wing, Machine-Learning-Team
dcausse created T384805: Unable to trigger dag with config.
Mon, Jan 27, 8:20 AM · Data-Engineering-Radar, Data-Platform-SRE (2025.02.10 - 2025.02.28), Patch-For-Review, Data-Engineering

Thu, Jan 23

dcausse moved T374921: Configure https://stream.wikimedia.org to expose rdf-streaming-updater.mutation from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Thu, Jan 23, 4:45 PM · Data-Engineering-Radar, Event-Platform, Data-Engineering, Discovery-Search (Current work), Wikidata
dcausse updated the task description for T382065: Add support for active/active double compute streams in the EventStreams HTTP service.
Thu, Jan 23, 3:37 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service, EventStreams
dcausse moved T382065: Add support for active/active double compute streams in the EventStreams HTTP service from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Thu, Jan 23, 2:50 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service, EventStreams
dcausse moved T374919: Adapt the rdf-streaming-updater flink job to use wikimedia-eventutilities-flink from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Thu, Jan 23, 1:18 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata

Tue, Jan 21

dcausse awarded T384344: Wikibase/Wikidata and WDQS disagree about statement, reference and value namespace prefixes a Love token.
Tue, Jan 21, 5:23 PM · MW-1.44-notes (1.44.0-wmf.16; 2025-02-11), Patch-For-Review, Wikidata Dev Team (Wikidata.org Slice), MediaWiki-extensions-WikibaseRepository, Wikidata Query UI, Wikidata
dcausse moved T373459: SUP: set up alerting for page_change_weighted_tags ingestion from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Tue, Jan 21, 3:54 PM · Discovery-Search (Current work), CirrusSearch, Growth-Team
dcausse created T384326: The rdf-streaming-updater should support reading its config from a yaml file.
Tue, Jan 21, 2:28 PM · Wikidata, Wikidata-Query-Service
dcausse moved T359033: EPIC: Convert CirrusSearch metrics to statslib from Epics to To Be Deployed on the Discovery-Search (Current work) board.

Starting from MW 1.44.0-wmf.14 CirrusSearch should no longer push any metrics to graphite.

Tue, Jan 21, 10:46 AM · Discovery-Search (2025.02.10 - 2025.02.28), Data-Platform-SRE (2025.02.10 - 2025.02.28), MW-1.43-notes (1.43.0-wmf.13; 2024-07-09), Observability-Metrics, Epic, CirrusSearch
dcausse moved T369148: Replace usage of StatsdDataFactory with StatsFactory from Needs Reporting to To Be Deployed on the Discovery-Search (Current work) board.
Tue, Jan 21, 10:41 AM · Discovery-Search, MW-1.44-notes (1.44.0-wmf.14; 2025-01-28), Patch-For-Review, Observability-Metrics, CirrusSearch
dcausse moved T369148: Replace usage of StatsdDataFactory with StatsFactory from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Tue, Jan 21, 10:41 AM · Discovery-Search, MW-1.44-notes (1.44.0-wmf.14; 2025-01-28), Patch-For-Review, Observability-Metrics, CirrusSearch

Mon, Jan 20

dcausse moved T374702: Cleanup: Remove deprecated weighted tag methods from Needs review to In Progress on the Discovery-Search (Current work) board.
Mon, Jan 20, 6:52 PM · Discovery-Search (2025.02.10 - 2025.02.28), MW-1.44-notes (1.44.0-wmf.15; 2025-02-04), Technical-Debt, CirrusSearch
dcausse moved T374702: Cleanup: Remove deprecated weighted tag methods from In Progress to Needs review on the Discovery-Search (Current work) board.
Mon, Jan 20, 6:51 PM · Discovery-Search (2025.02.10 - 2025.02.28), MW-1.44-notes (1.44.0-wmf.15; 2025-02-04), Technical-Debt, CirrusSearch

Fri, Jan 17

dcausse claimed T375821: Migrate streaming updater event schema to the standard schema repository.
Fri, Jan 17, 4:24 PM · Discovery-Search (2025.02.10 - 2025.02.28), Patch-For-Review
dcausse moved T369148: Replace usage of StatsdDataFactory with StatsFactory from In Progress to Needs review on the Discovery-Search (Current work) board.
Fri, Jan 17, 3:27 PM · Discovery-Search, MW-1.44-notes (1.44.0-wmf.14; 2025-01-28), Patch-For-Review, Observability-Metrics, CirrusSearch

Wed, Jan 15

dcausse added a comment to T369079: Update `UniqueValueChecker` to query a list of endpoints.

Hm, I think one consequence of the current broken query is also that distinct-values constraints are not checked on Commons (or rather, never return a result). Depending on how we implement the code changes, it might start to return Wikidata entities with the same value on Commons constraint check – but, unless we configure an authentication-less WCQS endpoint in the production config, it can never find other Commons (MediaInfo) entities.

My feeling is that such results would almost certainly not be useful. (Finding other Commons entities with the same value might be useful in some cases, though that’s maybe also worth discussing on-wiki.) So for now I think it might be best to completely disable the distinct-values constraint type on Commons, by mapping it to a non-existing item ID (similar to what we do for type constraints – see wgWBQualityConstraintsTypeConstraintId in IS.php). What do you think?

Wed, Jan 15, 6:41 PM · Patch-For-Review, MW-1.43-notes (1.43.0-wmf.19; 2024-08-20), Wikidata Dev Team (Wikidata.org Slice), Wikibase-Quality-Constraints, Wikidata
dcausse claimed T369148: Replace usage of StatsdDataFactory with StatsFactory .
Wed, Jan 15, 10:41 AM · Discovery-Search, MW-1.44-notes (1.44.0-wmf.14; 2025-01-28), Patch-For-Review, Observability-Metrics, CirrusSearch
dcausse added a comment to T383589: Fix skein/spark memory unit missfit.

@Gehel @JAllemandou @dcausse What do you think the priority here should be?

Is there a related task?

Wed, Jan 15, 8:57 AM · Data-Engineering

Jan 14 2025

dcausse moved T373459: SUP: set up alerting for page_change_weighted_tags ingestion from In Progress to Needs review on the Discovery-Search (Current work) board.
Jan 14 2025, 6:45 PM · Discovery-Search (Current work), CirrusSearch, Growth-Team
dcausse moved T373459: SUP: set up alerting for page_change_weighted_tags ingestion from Ready for Dev -- SWE to In Progress on the Discovery-Search (Current work) board.
Jan 14 2025, 4:56 PM · Discovery-Search (Current work), CirrusSearch, Growth-Team
dcausse reopened T369079: Update `UniqueValueChecker` to query a list of endpoints, a subtask of T337013: [Epic] Splitting the graph in WDQS, as Open.
Jan 14 2025, 4:51 PM · Discovery-Search (2025.02.10 - 2025.02.28), Epic, Wikidata-Query-Service, Wikidata
dcausse reopened T369079: Update `UniqueValueChecker` to query a list of endpoints as "Open".

Re-opening because the approach taken might not work if the duplicates are spread across the list of endpoints, please see T374021#10458231 for a possible solution.

Jan 14 2025, 4:51 PM · Patch-For-Review, MW-1.43-notes (1.43.0-wmf.19; 2024-08-20), Wikidata Dev Team (Wikidata.org Slice), Wikibase-Quality-Constraints, Wikidata
dcausse added a comment to T374021: Make WikibaseQualityConstraints use split-graph query service.

I believe it was originally done this way so that we didn’t have to implement serializing the statement value into the query (though later I added getRdfLiteral() anyway, which we should definitely remove and instead use Wikibase’s RdfBuilder infrastructure). But this doesn’t work for a split query service. We need to refactor the query to not rely on the original entity, and instead directly serialize the statement value into it after all.

Jan 14 2025, 3:23 PM · Discovery-Search (2025.02.10 - 2025.02.28), Data-Platform-SRE (2025.02.10 - 2025.02.28), User-ItamarWMDE, wmde-wikidata-tech, Wikibase-Quality-Constraints, Wikidata

Jan 13 2025

dcausse moved T373459: SUP: set up alerting for page_change_weighted_tags ingestion from Blocked/Waiting to Ready for Dev -- SWE on the Discovery-Search (Current work) board.
Jan 13 2025, 7:51 PM · Discovery-Search (Current work), CirrusSearch, Growth-Team
dcausse added a comment to T373459: SUP: set up alerting for page_change_weighted_tags ingestion.

@Michael I think this is relatively stable now, since search is not owning the individual sources of tags I think it might be better to have more fine-grained alerts (per tag?) on your side if you want, on our side I might set a very broad alert to capture only obvious problems (i.e. no tags updated in the last hour) but it might not cover failures specific to a particular tag.

Jan 13 2025, 7:49 PM · Discovery-Search (Current work), CirrusSearch, Growth-Team
dcausse moved T379046: WeightedTagsUpdater should indicate success of the update from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Jan 13 2025, 7:45 PM · MW-1.44-notes (1.44.0-wmf.4; 2024-11-19), Discovery-Search (Current work), Add-Link, CirrusSearch
dcausse moved T376440: Deepcategory search does not show any results on Commons instead of results up to the configured limits from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Jan 13 2025, 7:44 PM · MW-1.44-notes (1.44.0-wmf.5; 2024-11-25), Discovery-Search (Current work), CirrusSearch, Commons
dcausse moved T271776: Allow limiting lexeme searches by language from Wikibase Search to needs triage on the Discovery-Search board.

@Jdforrester-WMF ack, thanks.

Jan 13 2025, 5:53 PM · Discovery-Search (2025.02.10 - 2025.02.28), CirrusSearch, Wikidata, Wikidata Lexicographical data
dcausse added a comment to T271776: Allow limiting lexeme searches by language.

@DMartin-WMF regarding inlanguage:en also matching Q1860 I'm not sure about the implementation details and it might be possible that CirrusSearch would have to do some lookups too (or have a map defined in its config) if it provided such feature. First I wanted to know if re-using inlanguage would be OK since the description explicitly asked for a new keyword haslang.

Jan 13 2025, 5:30 PM · Discovery-Search (2025.02.10 - 2025.02.28), CirrusSearch, Wikidata, Wikidata Lexicographical data
dcausse moved T375387: Include fulltext search results Page Previews of sufficient dwell time in Search Metrics dashboard from Ready for Dev -- SWE to Needs review on the Discovery-Search (Current work) board.
Jan 13 2025, 4:34 PM · MW-1.43-notes (1.43.0-wmf.27; 2024-10-15), Web Team Essential Work 2025, Patch-For-Review, Discovery-Search (Current work)

Jan 10 2025

dcausse closed T156037: Load cirrussearch data into druid as Declined.

we have this data in relforge so I don't think we'll need this in druid, but feel free to re-open if we believe it might still be useful.

Jan 10 2025, 8:46 AM · Discovery-Search (Current work), Data-Engineering-Icebox, Data-Engineering, Analytics-Radar, Discovery-ARCHIVED, CirrusSearch
dcausse closed T119897: Create cron on 1002 to remove CirrusSearchRequest partitions as Declined.

this dataset is cleaned by airflow now

Jan 10 2025, 8:23 AM · Data-Engineering-Icebox, Data-Engineering, Analytics-Radar

Jan 9 2025

dcausse added a comment to T375387: Include fulltext search results Page Previews of sufficient dwell time in Search Metrics dashboard.

Updated https://superset.wikimedia.org/superset/dashboard/530 with minimal info in the Fulltext Abandonment chart. If I understood the metric properly only ~1% of sessions get a virtual PV without a click, I don't know if this is enough to consider it as a meaningful interaction nor if could be considered as successful sessions.
It is not entirely obvious how to integrate this data in current charts of the dashboard, it might make sense to have a "Fulltext Engagement" section where we could graph the various ways user interacts with the SERP.

Jan 9 2025, 6:00 PM · MW-1.43-notes (1.43.0-wmf.27; 2024-10-15), Web Team Essential Work 2025, Patch-For-Review, Discovery-Search (Current work)
dcausse added a comment to T382065: Add support for active/active double compute streams in the EventStreams HTTP service.

Deployed Eventstreams v0.10.0 on beta and it throws this error when listening to a stream:

{"message":"No topics available for consumption. This likely means that the configured allowedTopics do not currently exist.","origin":"KafkaSSE","name":"ConfigurationError","allowedTopics":[null],"statusCode":500}
Jan 9 2025, 2:36 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service, EventStreams
dcausse updated the task description for T383333: Add gmodena to analytics-search-users.
Jan 9 2025, 2:22 PM · Discovery-Search (Current work), Data-Platform-SRE (2024.11.30 - 2024.12.20)
dcausse created T383333: Add gmodena to analytics-search-users.
Jan 9 2025, 2:19 PM · Discovery-Search (Current work), Data-Platform-SRE (2024.11.30 - 2024.12.20)
dcausse added a comment to T168973: Checking if a book is an instance of work is slow without explicit gearing hint.

I was asking myself, how I can understand that his is not affected by the graph-split. Do we know that the query will lead to an empty result set if not executed on the primary cluster, or is that something specific to the first split.

This ticket is describing a problem in the blazegraph's query planner but I believe that you are interested in the impact of the split on Wikibase-Quality-Constraints, please see T355298 for this.
Regarding this particularly query, as explained in Internal_Federation_Guide property paths have to be made explicit if one link is crossing multiple subgraphs. In this case, indeed, the link ?entity wdt:P31 ?class might cross subgraphs.
But for Quality-Constraints my understanding is that this query has evolved to only check the class hierarchy and is now more like:

ASK { ?classOfTheEntity wdt:P279* wd:Q386724. hint:Prior hint:gearing "forward". }

Where ?classOfTheEntity is injected from the PHP side (which is good, Wikibase-Quality-Constraints should not assume that the data of the entity it is checking is already available in the sparql backend).

Jan 9 2025, 8:55 AM · WDQS-Optimizer, Upstream, Discovery-ARCHIVED, Wikidata-Query-Service, Wikibase-Quality-Constraints, Wikibase-Quality, Wikidata

Jan 8 2025

dcausse added a comment to T381388: Search with interwiki prefix ignores an anchor.

@MSantos I can certainly attempt to write a patch and continue the discussion in gerrit but perhaps before doing so I wanted to know if:

  • is it on purpose that the fragment is omitted when building the Special:GoToInterwiki from Title#getFullUrlForRedirect()?
  • if not would there be any objections in propagating the Title fragment when building the Special:GoToInterwiki link and if there are any special considerations to take into consideration doing so
Jan 8 2025, 5:07 PM · Discovery-Search (2025.02.10 - 2025.02.28), MediaWiki-Engineering, CirrusSearch
dcausse added a comment to T348943: Deploy multi-tenant OpenSearch cluster as replacement for Elasticsearch.

Linking T379288 since we might explore security features too for the (upcoming) WMF internal opensearch cluster used for search.

Jan 8 2025, 4:25 PM · Epic, cloud-services-team, Elasticsearch, Toolforge
dcausse added a project to T381388: Search with interwiki prefix ignores an anchor: MediaWiki-Engineering.

The indirection via Special:GoToInterwiki was added in T122209: Special:Search allows redirects to any interwiki link.
I'm not clear if omitting the fragment was intentional or not, I believe that it might be possible to propagate it but I'm unclear on the consequences.
In the case of local interwiki I think the fragment will be propagated by the browser (if not specified in the Location Header): https://de.wikipedia.org/wiki/Special:GoToInterwiki/en:Wikipedia#History (works for me with firefox)
In the case of external interwiki I think we might have to adapt the splash page to reconstruct it in the link presented to the user (using via javascript?)
Special consideration might have to be made for external interwiki links that already declare a fragment, e.g. https://en.wikipedia.org/wiki/Special:GoToInterwiki/mixnmatch:collection where it might be impossible to propagate the user provided fragment.

Jan 8 2025, 1:51 PM · Discovery-Search (2025.02.10 - 2025.02.28), MediaWiki-Engineering, CirrusSearch
dcausse created T383218: Mjolnir is sometimes stuck in feature selection.
Jan 8 2025, 1:21 PM · Discovery-Search (2025.02.10 - 2025.02.28), CirrusSearch

Jan 7 2025

dcausse added a comment to T381388: Search with interwiki prefix ignores an anchor.

This seems to be working on Vector 2022

It fails for me in all tested skins and browsers, e.g. logged out with Vector 2022 and Firefox at https://simple.wikipedia.org.

Jan 7 2025, 3:34 PM · Discovery-Search (2025.02.10 - 2025.02.28), MediaWiki-Engineering, CirrusSearch
dcausse closed T282823: srnamespace parameter is ignored when srsearch value begins with "Namespace:", in requests to action 'query' list 'search' as Declined.

This the expected behavior, in some conditions the search syntax is allowed to escape the namespace filtering provided by other means in the UI or query parameters.
This happens for:

  • the case you mention with a namespace followed by a colon
  • the prefix keyword which also allows passing a namespace prefix

This is documented at https://www.mediawiki.org/wiki/Help:CirrusSearch#Prefix_and_namespace

Jan 7 2025, 2:53 PM · Discovery-Search (Current work), MediaWiki-Search, MediaWiki-Action-API
dcausse added a comment to T382620: The Search/articletopic page at Wikitech appears to be out of date.

Thanks for the update, this is very useful to know. The API works for me, and I can also see the data in the new stream. FTR, I wasn't building any automation around topics, I was checking a possible bug in GrowthExperiments and wanted to know what the topics are.

Jan 7 2025, 11:29 AM · Discovery-Search (Current work), Lift-Wing, Documentation
dcausse claimed T382620: The Search/articletopic page at Wikitech appears to be out of date.

Thanks for pointing this out, you were correct, mediawiki.page_outlink_topic_prediction_change.v1 is indeed the new stream being populated and used by the search update pipeline, I updated the doc with new links and stream names. I think that predictions can be run calling liftwing, from https://meta.wikimedia.org/wiki/Machine_learning_models/Production/Language_agnostic_link-based_article_topic:

curl https://api.wikimedia.org/service/lw/inference/v1/models/outlink-topic-model:predict -X POST -d '{"page_title": "Frida_Kahlo", "lang": "en", "threshold": 0.1}' -H "Content-type: application/json"

Should generate the predictions, but I believe that using pre-computed predictions from hive if possible is certainly better.

Jan 7 2025, 9:47 AM · Discovery-Search (Current work), Lift-Wing, Documentation

Jan 6 2025

dcausse created T383074: The CirrusSearch Saneitizer should support weighted_tags.
Jan 6 2025, 5:25 PM · Discovery-Search, CirrusSearch
dcausse added a comment to T347973: Option to Exclude Disambiguation Pages in Search API Endpoint.

The example mentions wikidata, one can exclude items referring to disambiguation pages by adding -haswbstatement:P31=Q4167410 in their search query: https://www.wikidata.org/w/api.php?action=query&list=search&srsearch=test+-haswbstatement%3AP31%3DQ4167410&format=json&formatversion=2 (excludes items that are instances of Wikimedia disambiguation page).

Jan 6 2025, 2:56 PM · Discovery-Search (Current work), CirrusSearch, MediaWiki-extensions-Disambiguator, MediaWiki-Action-API
dcausse moved T377546: MediaWiki CirrusSearch Saneitizer is fixing an abnormally high number of documents in cloudelastic from Ready for Dev -- SWE to Needs Reporting on the Discovery-Search (Current work) board.

Indeed, https://grafana-rw.wikimedia.org/d/2DIjJ6_nk/cirrussearch-saneitizer-historical-fix-rate?orgId=1 shows a bump mid October and is now back to "normal", cause is unknown and might be hard to investigate now, boldly closing.

Jan 6 2025, 2:32 PM · Discovery-Search (Current work), CirrusSearch

Dec 20 2024

dcausse claimed T374921: Configure https://stream.wikimedia.org to expose rdf-streaming-updater.mutation.
Dec 20 2024, 3:02 PM · Data-Engineering-Radar, Event-Platform, Data-Engineering, Discovery-Search (Current work), Wikidata

Dec 19 2024

dcausse moved T326311: Deletion of Lexemes appears to leak triples related to its forms and senses from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Dec 19 2024, 4:16 PM · Discovery-Search (2025.02.10 - 2025.02.28), Patch-For-Review, Wikidata
dcausse moved T382065: Add support for active/active double compute streams in the EventStreams HTTP service from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Dec 19 2024, 4:14 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service, EventStreams

Dec 18 2024

dcausse moved T378382: Update cirrus-reindex-orchestrator for mwscript-on-k8s from Needs Reporting to Blocked/Waiting on the Discovery-Search (Current work) board.

Blocked by T382398

Dec 18 2024, 9:54 AM · Discovery-Search (2025.02.10 - 2025.02.28), MW-1.44-notes (1.44.0-wmf.8; 2024-12-17), Patch-For-Review
dcausse created T382398: Mediawiki maint scripts using service proxied by the tls proxy might fail when running with mwscript-k8s.
Dec 18 2024, 9:37 AM · serviceops

Dec 17 2024

dcausse added a comment to T375641: [ES-M3]: Implement label and aliases search for EntitySchemas via the wbsearchentities API.

@Lucas_Werkmeister_WMDE should be done, only one schema is now missing an english label (E15): https://test.wikidata.org/w/index.php?search=EntitySchema%3A-haslabel%3Aen&title=Special:Search&profile=default&fulltext=1 (from the UI it's not clear if it's empty or unset)

Dec 17 2024, 2:52 PM · EntitySchema (M3: EntitySchemas shown as labels instead of ID), Wikidata Dev Team (Wikidata.org Slice), Epic, Wikidata-Campsite, Wikidata
dcausse moved T378097: Investigation: why do statements on Senses and Forms not show up in searches using haswbstatement from Ready for Dev -- SWE to Needs Reporting on the Discovery-Search (Current work) board.
Dec 17 2024, 2:06 PM · MW-1.44-notes (1.44.0-wmf.6; 2024-12-03), Abstract Wikipedia team, Wikidata Lexicographical data, Discovery-Search (Current work), CirrusSearch, Wikidata