Page MenuHomePhabricator

dcausse (David Causse)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Jun 9 2015, 9:03 AM (457 w, 6 d)
Availability
Available
IRC Nick
dcausse
LDAP User
DCausse
MediaWiki User
DCausse (WMF) [ Global Accounts ]

Recent Activity

Fri, Mar 8

dcausse moved T355451: Update URLs on MediaWiki:Elastica-desc from Incoming to Needs review on the Discovery-Search (Current work) board.
Fri, Mar 8, 10:52 AM · MW-1.42-notes (1.42.0-wmf.22; 2024-03-12), Discovery-Search (Current work), Elasticsearch

Thu, Mar 7

dcausse added a comment to T359215: mediawiki_cirrussearch_request data is regularly late.

Discussed the issue today with @JAllemandou and the reason is that CirrusSearch in some circonstances might send these outdated events, we will fix the root cause (T359580) and in the meantime these alerts for this dataset can be ignored.

Thu, Mar 7, 6:39 PM · Performance Issue, Data-Platform
dcausse created T359580: CirrusSearch should not send outdated cirrussearch-request events.
Thu, Mar 7, 6:37 PM · Discovery-Search, CirrusSearch

Tue, Mar 5

dcausse claimed T357966: Document limitations of blazegraph federation.
Tue, Mar 5, 5:38 PM · Discovery-Search (Current work), Wikidata
dcausse moved T353683: Unable to find a file by filename while adding a Commons media file statement from In Progress to Needs review on the Discovery-Search (Current work) board.

changed the layout of the query a bit by moving the logistic function introduced in T271799 to the top-level so that it wraps the new nearmatch clause

Tue, Mar 5, 5:25 PM · Patch-For-Review, MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), Structured-Data-Backlog, SDAW-MediaSearch, Discovery-Search (Current work), CirrusSearch, Wikidata
dcausse moved T355451: Update URLs on MediaWiki:Elastica-desc from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Tue, Mar 5, 3:25 PM · MW-1.42-notes (1.42.0-wmf.22; 2024-03-12), Discovery-Search (Current work), Elasticsearch

Mon, Mar 4

dcausse claimed T357980: Compile a set of queries rewritten with federation across the two graph splits.

Compiled 10 real world examples at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples

Mon, Mar 4, 7:44 PM · Discovery-Search (Current work), Wikidata
dcausse added a comment to T355040: Compare the results of sparql queries between the fullgraph and the subgraphs.

final report available at https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/WDQS_Graph_Split_Impact_Analysis

Mon, Mar 4, 7:41 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata
lmata awarded T359033: EPIC: Convert CirrusSearch metrics to statslib a Like token.
Mon, Mar 4, 7:39 PM · Observability-Metrics, Epic, Discovery-Search (Current work), CirrusSearch
dcausse added a comment to T356773: [tracking] Community feedback for the WDQS Split the Graph project.

@Physikerwelt thanks for your feedback.

Mon, Mar 4, 7:25 PM · Discovery-Search (Current work), Wikidata
dcausse added a comment to T356773: [tracking] Community feedback for the WDQS Split the Graph project.

I tried to get the federation working, but got time outs too. The problem is that the current setup makes splits at a statement level. That is, given statements with some property (e.g. P2860 and P1433), some results are in one QS instance and some are in the other. That means a lot of federation-union combinations to get all results. I posted an example query that is affected (the first I tried) in this issue report: https://github.com/WDscholia/scholia/issues/2423

Mon, Mar 4, 7:02 PM · Discovery-Search (Current work), Wikidata
dcausse moved T353683: Unable to find a file by filename while adding a Commons media file statement from To Be Deployed to In Progress on the Discovery-Search (Current work) board.

The new builder moved the result to #4 which is better but still not enough and it's beaten by 3 other images because other criteria:

  • weighted_tags:image.linked.from.wikipedia.lead_image/Q458
  • statement_keywords:p180=q458
Mon, Mar 4, 5:00 PM · Patch-For-Review, MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), Structured-Data-Backlog, SDAW-MediaSearch, Discovery-Search (Current work), CirrusSearch, Wikidata
dcausse moved T359033: EPIC: Convert CirrusSearch metrics to statslib from Incoming to Epics on the Discovery-Search (Current work) board.
Mon, Mar 4, 4:53 PM · Observability-Metrics, Epic, Discovery-Search (Current work), CirrusSearch
dcausse renamed T359033: EPIC: Convert CirrusSearch metrics to statslib from Convert CirrusSearch metrics to statslib to EPIC: Convert CirrusSearch metrics to statslib.
Mon, Mar 4, 4:52 PM · Observability-Metrics, Epic, Discovery-Search (Current work), CirrusSearch
dcausse moved T355040: Compare the results of sparql queries between the fullgraph and the subgraphs from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mon, Mar 4, 4:15 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata
dcausse moved T355040: Compare the results of sparql queries between the fullgraph and the subgraphs from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Mon, Mar 4, 4:15 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata
dcausse moved T355451: Update URLs on MediaWiki:Elastica-desc from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Mon, Mar 4, 4:15 PM · MW-1.42-notes (1.42.0-wmf.22; 2024-03-12), Discovery-Search (Current work), Elasticsearch
dcausse moved T353683: Unable to find a file by filename while adding a Commons media file statement from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Mon, Mar 4, 4:14 PM · Patch-For-Review, MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), Structured-Data-Backlog, SDAW-MediaSearch, Discovery-Search (Current work), CirrusSearch, Wikidata
dcausse added a subtask for T343020: Converting MediaWiki Metrics to StatsLib: T359033: EPIC: Convert CirrusSearch metrics to statslib.
Mon, Mar 4, 10:05 AM · SRE Observability (FY2023/2024-Q3), Observability-Metrics
dcausse added a parent task for T359033: EPIC: Convert CirrusSearch metrics to statslib: T343020: Converting MediaWiki Metrics to StatsLib.
Mon, Mar 4, 10:05 AM · Observability-Metrics, Epic, Discovery-Search (Current work), CirrusSearch
dcausse created T359033: EPIC: Convert CirrusSearch metrics to statslib.
Mon, Mar 4, 10:05 AM · Observability-Metrics, Epic, Discovery-Search (Current work), CirrusSearch

Fri, Mar 1

dcausse added a comment to T316421: Upgrade etherpad.wikimedia.org to v1.9.7.

Since the upgrade I believe that we are affected by https://github.com/ether/etherpad-lite/issues/5401. Wondering if a stale config.json file got kept with padOptions.userName & userColor set to false instead of null.

Fri, Mar 1, 10:59 AM · User-notice-archive, collaboration-services, SRE, Wikimedia-Etherpad

Thu, Feb 29

dcausse updated the task description for T358472: Search dag image_suggestions_weekly failed with: Empty dataframe provided.
Thu, Feb 29, 9:27 AM · Patch-For-Review, Discovery-Search (Current work), Structured-Data-Backlog, Image-Suggestions

Mon, Feb 26

dcausse added a comment to T357980: Compile a set of queries rewritten with federation across the two graph splits.

WIP at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples

Mon, Feb 26, 3:25 PM · Discovery-Search (Current work), Wikidata
dcausse updated the task description for T358472: Search dag image_suggestions_weekly failed with: Empty dataframe provided.
Mon, Feb 26, 1:31 PM · Patch-For-Review, Discovery-Search (Current work), Structured-Data-Backlog, Image-Suggestions
dcausse created T358472: Search dag image_suggestions_weekly failed with: Empty dataframe provided.
Mon, Feb 26, 9:48 AM · Patch-For-Review, Discovery-Search (Current work), Structured-Data-Backlog, Image-Suggestions

Tue, Feb 20

dcausse claimed T355451: Update URLs on MediaWiki:Elastica-desc.
Tue, Feb 20, 3:50 PM · MW-1.42-notes (1.42.0-wmf.22; 2024-03-12), Discovery-Search (Current work), Elasticsearch
dcausse added a subtask for T337013: [Epic] Splitting the graph in WDQS: T357980: Compile a set of queries rewritten with federation across the two graph splits.
Tue, Feb 20, 2:00 PM · Discovery-Search (Current work), Epic, Wikidata-Query-Service, Wikidata
dcausse added a parent task for T357980: Compile a set of queries rewritten with federation across the two graph splits: T337013: [Epic] Splitting the graph in WDQS.
Tue, Feb 20, 2:00 PM · Discovery-Search (Current work), Wikidata
dcausse renamed T357980: Compile a set of queries rewritten with federation across the two graph splits from Compile a set of queries rewritten with federation accross the two graph splits to Compile a set of queries rewritten with federation across the two graph splits.
Tue, Feb 20, 2:00 PM · Discovery-Search (Current work), Wikidata
dcausse created T357980: Compile a set of queries rewritten with federation across the two graph splits.
Tue, Feb 20, 1:58 PM · Discovery-Search (Current work), Wikidata
dcausse added a subtask for T337013: [Epic] Splitting the graph in WDQS: T357966: Document limitations of blazegraph federation.
Tue, Feb 20, 11:03 AM · Discovery-Search (Current work), Epic, Wikidata-Query-Service, Wikidata
dcausse added a parent task for T357966: Document limitations of blazegraph federation: T337013: [Epic] Splitting the graph in WDQS.
Tue, Feb 20, 11:03 AM · Discovery-Search (Current work), Wikidata
dcausse created T357966: Document limitations of blazegraph federation.
Tue, Feb 20, 10:59 AM · Discovery-Search (Current work), Wikidata

Feb 9 2024

dcausse edited P56589 ForceSearchIndex.
Feb 9 2024, 5:06 PM
dcausse edited P56589 ForceSearchIndex.
Feb 9 2024, 3:53 PM
dcausse updated the title for P56589 ForceSearchIndex from untitled to ForceSearchIndex.
Feb 9 2024, 3:46 PM
dcausse created P56589 ForceSearchIndex.
Feb 9 2024, 3:46 PM

Feb 8 2024

dcausse moved T355040: Compare the results of sparql queries between the fullgraph and the subgraphs from In Progress to Needs review on the Discovery-Search (Current work) board.

Draft report up at https://wikitech.wikimedia.org/wiki/User:DCausse/WDQS_Graph_Split_Impact_Analysis

Feb 8 2024, 8:38 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata
dcausse added a comment to T353453: [Analytics] Impact of Scholia on WDQS.

Quick note on this:

There are two ways that need to be factored in to deriving if a query is from Scholia. Some queries do start with #tool: scholia as @dcausse suggested, but I checked for user agents and also found that the string "Scholia" is also used as a user agent. Big thing is that some of the queries have the comment and some have the user agent, but in no cases do we have both.

Feb 8 2024, 4:08 PM · Wikidata Analytics (Kanban), Wikidata
dr0ptp4kt awarded T349512: [Analytics] Collect multiple sets of SPARQL queries a Party Time token.
Feb 8 2024, 11:48 AM · Wikidata Analytics (Kanban), Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Feb 2 2024

dcausse updated the task description for T356030: Search dag image_suggestions_weekly failed waiting for analytics_platform_eng.image_suggestions_search_index_delta/snapshot=2024-01-15.
Feb 2 2024, 5:33 PM · Discovery-Search (Current work), Data-Engineering (Sprint 8), Image-Suggestions
dcausse added a comment to T356030: Search dag image_suggestions_weekly failed waiting for analytics_platform_eng.image_suggestions_search_index_delta/snapshot=2024-01-15.

@mfossati OK resumed the dag to process 2024-01-15, marked 2024-01-22 explicitly as failed while you figure out what's going on there.

Feb 2 2024, 5:28 PM · Discovery-Search (Current work), Data-Engineering (Sprint 8), Image-Suggestions
dcausse added a comment to T356030: Search dag image_suggestions_weekly failed waiting for analytics_platform_eng.image_suggestions_search_index_delta/snapshot=2024-01-15.

@mfossati if 2024-01-22 is running it probably means that it's comparing against an index that do not have 2024-01-15 suggestions, should I skip 2024-01-15 and let the dag pick 2024-01-22?

Feb 2 2024, 4:18 PM · Discovery-Search (Current work), Data-Engineering (Sprint 8), Image-Suggestions
dcausse moved T353683: Unable to find a file by filename while adding a Commons media file statement from In Progress to Needs review on the Discovery-Search (Current work) board.
Feb 2 2024, 3:26 PM · Patch-For-Review, MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), Structured-Data-Backlog, SDAW-MediaSearch, Discovery-Search (Current work), CirrusSearch, Wikidata
dcausse added a comment to T355040: Compare the results of sparql queries between the fullgraph and the subgraphs.

WIP:

  • included the new 100k queries sample named QUERY-Q4 from T349512 (random sample that is representative of the query length and runtime)
  • the % of affected queries (deduplicated) per tool is (sample being the QUERY-Q4 sample mentionned above)
    image.png (470×771 px, 33 KB)

The above graph should be taken with a grain of salt as the number of queries per datapoints varies a lot (86 queries for Listeria vs 85k for random), these numbers are being reviewed so no conclusions should be drawn yet but it does not seem that we obtain the same numbers that were found originally in Wikidata_Subgraph_Query_Analysis where 2.5% of the total query count are being identified as requiring scholarly articles.
A more qualitative analysis is in progress:

  • analyze of the user agents to understand what usecases are mainly affected, preliminary results show that for instance a single UA is the cause of 50% of the affected queries
  • extract some SPARQL queries to start evaluating how federation could be applied/tested
Feb 2 2024, 3:05 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata
dcausse moved T351819: Create a tool that records and compares a set of sparql query results from In Progress to Needs Reporting on the Discovery-Search (Current work) board.
Feb 2 2024, 1:42 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse added a comment to T355037: Compare the performance of sparql queries between the full graph and the subgraphs.

@dr0ptp4kt thanks! is the difference in the number of successful queries only explained by the improvement in query time or are there some improvements in the number of queries that timeout as well?

Feb 2 2024, 9:06 AM · Discovery-Search (Current work), Wikidata

Feb 1 2024

dcausse added a comment to T356400: User aqsloader hasn't MODIFY permissions on image_suggestions.* Cassandra tables anymore.

The airflow dag (search) image_suggestions_weekly has been paused today

Feb 1 2024, 1:17 PM · Patch-For-Review, Discovery-Search (Current work), Structured-Data-Backlog (Current Work), User-Eevans, Cassandra, Data Products
dcausse moved T356400: User aqsloader hasn't MODIFY permissions on image_suggestions.* Cassandra tables anymore from Incoming to Blocked/Waiting on the Discovery-Search (Current work) board.
Feb 1 2024, 1:04 PM · Patch-For-Review, Discovery-Search (Current work), Structured-Data-Backlog (Current Work), User-Eevans, Cassandra, Data Products
dcausse added a project to T356400: User aqsloader hasn't MODIFY permissions on image_suggestions.* Cassandra tables anymore: Discovery-Search (Current work).
Feb 1 2024, 1:04 PM · Patch-For-Review, Discovery-Search (Current work), Structured-Data-Backlog (Current Work), User-Eevans, Cassandra, Data Products

Jan 31 2024

dcausse updated the task description for T355888: Enable cross federation between experimental WDQS endpoints.
Jan 31 2024, 5:41 PM · Patch-For-Review, Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata
dcausse created T356244: MediaSearch should display search warnings.
Jan 31 2024, 8:57 AM · SDAW-MediaSearch, Structured-Data-Backlog
dcausse created T356243: process_sparql_query_hourly sometimes fails on the jena sparql parser.
Jan 31 2024, 8:46 AM · Wikidata-Query-Service

Jan 30 2024

dcausse added a comment to T356161: WikibaseMediaInfo seems to reuse statement identifiers from other entities.

Scanning dumps from 2024/01/21 we can find 1623 duplicated statement ids (full list here: https://people.wikimedia.org/~dcausse/T356161_sdc_duplicated_statement_ids.csv)

Jan 30 2024, 2:12 PM · Wikidata, Wikidata-Query-Service, Structured-Data-Backlog, WikibaseMediaInfo
dcausse renamed T356161: WikibaseMediaInfo seems to reuse statement identifiers from other entities from WikibaseMediaInfo (or Wikibase?) seems to reuse statement identifiers from other entities to WikibaseMediaInfo seems to reuse statement identifiers from other entities.
Jan 30 2024, 10:48 AM · Wikidata, Wikidata-Query-Service, Structured-Data-Backlog, WikibaseMediaInfo
dcausse added a comment to T356161: WikibaseMediaInfo seems to reuse statement identifiers from other entities.

@Lucas_Werkmeister_WMDE thanks for all the context! I get that it only affects WikibaseMediaInfo. Can we exclude Wikibase as a culprit possibly affecting wikidata or should we run a quick investigation to find possible duplicated statement identifiers in the wikidata RDF dumps?

Jan 30 2024, 10:15 AM · Wikidata, Wikidata-Query-Service, Structured-Data-Backlog, WikibaseMediaInfo
dcausse updated the task description for T356161: WikibaseMediaInfo seems to reuse statement identifiers from other entities.
Jan 30 2024, 9:46 AM · Wikidata, Wikidata-Query-Service, Structured-Data-Backlog, WikibaseMediaInfo
dcausse created T356161: WikibaseMediaInfo seems to reuse statement identifiers from other entities.
Jan 30 2024, 8:54 AM · Wikidata, Wikidata-Query-Service, Structured-Data-Backlog, WikibaseMediaInfo

Jan 29 2024

dcausse updated the task description for T356030: Search dag image_suggestions_weekly failed waiting for analytics_platform_eng.image_suggestions_search_index_delta/snapshot=2024-01-15.
Jan 29 2024, 1:33 PM · Discovery-Search (Current work), Data-Engineering (Sprint 8), Image-Suggestions
dcausse updated the task description for T356030: Search dag image_suggestions_weekly failed waiting for analytics_platform_eng.image_suggestions_search_index_delta/snapshot=2024-01-15.
Jan 29 2024, 8:32 AM · Discovery-Search (Current work), Data-Engineering (Sprint 8), Image-Suggestions
dcausse updated the task description for T356030: Search dag image_suggestions_weekly failed waiting for analytics_platform_eng.image_suggestions_search_index_delta/snapshot=2024-01-15.
Jan 29 2024, 8:30 AM · Discovery-Search (Current work), Data-Engineering (Sprint 8), Image-Suggestions
dcausse created T356030: Search dag image_suggestions_weekly failed waiting for analytics_platform_eng.image_suggestions_search_index_delta/snapshot=2024-01-15.
Jan 29 2024, 8:28 AM · Discovery-Search (Current work), Data-Engineering (Sprint 8), Image-Suggestions

Jan 26 2024

dcausse added a comment to T355619: Request MediaWiki +2 for Paladox.

I worked with Paladox in the past on the gerrit codebase, I trust him to use his +2 rights wisely.

Jan 26 2024, 2:54 PM · MediaWiki-Gerrit-Group-Requests
dcausse awarded T355619: Request MediaWiki +2 for Paladox a Like token.
Jan 26 2024, 2:53 PM · MediaWiki-Gerrit-Group-Requests
dcausse added a comment to T355040: Compare the results of sparql queries between the fullgraph and the subgraphs.

WIP: https://people.wikimedia.org/~dcausse/T355040_EARLY_DRAFT_wdqs_query_results_analysis.html (UA redacted for now)

Jan 26 2024, 9:18 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata

Jan 25 2024

dcausse added a subtask for T351650: Expose 3 new dedicated WDQS endpoints: T355888: Enable cross federation between experimental WDQS endpoints.
Jan 25 2024, 1:51 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse added a parent task for T355888: Enable cross federation between experimental WDQS endpoints: T351650: Expose 3 new dedicated WDQS endpoints.
Jan 25 2024, 1:51 PM · Patch-For-Review, Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata
dcausse created T355888: Enable cross federation between experimental WDQS endpoints.
Jan 25 2024, 1:50 PM · Patch-For-Review, Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata

Jan 24 2024

dcausse moved T355066: SUP: Process (large) JSON responses non-blocking to save memory from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.

Nice improvements seen on the young GC rate after deploying the change:

image.png (481×1 px, 112 KB)

Jan 24 2024, 8:29 PM · Patch-For-Review, Discovery-Search (Current work), CirrusSearch

Jan 19 2024

dcausse added a comment to T355040: Compare the results of sparql queries between the fullgraph and the subgraphs.

Quick report on the progress being made:

  • Our query logs do not only contain sparql queries and the sparql client used to collect the data has to be adapted to support these (ASK, CONSTRUCT, DESCRIBE) (https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/991622)
  • Getting failures due to response size, bumped the limit to 16M but still getting problems, I might stop here and simply tag & ignore such massive queries moving forward
  • Getting very bad numbers from Listeria and MixNMatch (34% and 17% identical respectively), avg result size is 1.6k and 8k so might explain partly why getting identical results is difficult, need more investigations to understand the cause...
  • Getting pretty mediocre numbers for WikidataIntegrator at 88% with very small avg result size at 8, more investigation needed
  • Pywikibot and SPARQLWrapper are good at 99.4% for both
Jan 19 2024, 1:43 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata

Jan 18 2024

dcausse awarded T355352: Users in archiva-deployer group can't upload artifacts anymore. a Love token.
Jan 18 2024, 6:42 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Data-Engineering (Sprint 7)
dcausse claimed T353683: Unable to find a file by filename while adding a Commons media file statement.
Jan 18 2024, 4:13 PM · Patch-For-Review, MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), Structured-Data-Backlog, SDAW-MediaSearch, Discovery-Search (Current work), CirrusSearch, Wikidata

Jan 16 2024

dcausse added a comment to T355122: SonarQube build are failing with Java 11.

tools.jar should only be in jdk8, I'm surprised that this problem did not occur while sonar was running java11

Jan 16 2024, 1:23 PM · Patch-For-Review, Discovery-Search (Current work), Data-Platform-SRE, Data-Platform, Data-Engineering, Release-Engineering-Team

Jan 15 2024

dcausse created T355040: Compare the results of sparql queries between the fullgraph and the subgraphs.
Jan 15 2024, 10:08 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata
dcausse added a subtask for T352538: [EPIC] Evaluate the impact of the graph split: T355037: Compare the performance of sparql queries between the full graph and the subgraphs.
Jan 15 2024, 8:53 AM · Epic, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse added a parent task for T355037: Compare the performance of sparql queries between the full graph and the subgraphs: T352538: [EPIC] Evaluate the impact of the graph split.
Jan 15 2024, 8:53 AM · Discovery-Search (Current work), Wikidata
dcausse renamed T355037: Compare the performance of sparql queries between the full graph and the subgraphs from Com to Compare the performance of sparql queries between the full graph and the subgraphs.
Jan 15 2024, 8:49 AM · Discovery-Search (Current work), Wikidata
dcausse created T355037: Compare the performance of sparql queries between the full graph and the subgraphs.
Jan 15 2024, 8:45 AM · Discovery-Search (Current work), Wikidata

Jan 9 2024

dcausse added a project to T353683: Unable to find a file by filename while adding a Commons media file statement: SDAW-MediaSearch.

Selecting only namespace=6 does trigger the MediaSearch query profile which does not include the all_near_match field which is the one helping the most to rank almost perfect title matches to the top.
I believe that the fix would be to fix the MediaSearch query builder to include an optional clause on the all_near_match field.
@Cparle do you remember if not including all_near_match was done on purpose and if it would break any existing usecases to add it?

Jan 9 2024, 9:28 AM · Patch-For-Review, MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), Structured-Data-Backlog, SDAW-MediaSearch, Discovery-Search (Current work), CirrusSearch, Wikidata

Jan 8 2024

dcausse added a comment to T354517: Search Update Pipeline: HTTP client/proxy config.

Therefore, we should use mw-api-int-async (w/o retries) instead of mw-api-int-async-ro

Jan 8 2024, 3:00 PM · Discovery-Search (Current work), CirrusSearch

Jan 5 2024

dcausse updated the task description for T353460: The consumer job of the SUP does not achieve its expected throughput.
Jan 5 2024, 4:25 PM · Discovery-Search (Current work), CirrusSearch
dcausse updated the task description for T353460: The consumer job of the SUP does not achieve its expected throughput.
Jan 5 2024, 3:36 PM · Discovery-Search (Current work), CirrusSearch
dcausse updated the task description for T353460: The consumer job of the SUP does not achieve its expected throughput.
Jan 5 2024, 1:50 PM · Discovery-Search (Current work), CirrusSearch
dcausse updated the task description for T353460: The consumer job of the SUP does not achieve its expected throughput.
Jan 5 2024, 10:30 AM · Discovery-Search (Current work), CirrusSearch

Jan 4 2024

dcausse added a comment to T354142: 502 error on some Lingua Libre federated queries.

Closed as a duplicate of T299290, quickly testing it seems that the 502 is triggered depending on the query size:

select * {
  service <https://lingualibre.org/sparql> {
      ?e <https://lingualibre.org/prop/direct/P3> <aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa> . 
  }
}

will work but adding another "a" will fail the query, this problem is reproduced at T299290#7667535 directly using the lingualibre endpoint without using federation.

Jan 4 2024, 6:37 PM · Wikidata, Wikidata-Query-Service
dcausse merged task T354142: 502 error on some Lingua Libre federated queries into T299290: Unexpected behavior in federated queries with LinguaLibre in WDQS.
Jan 4 2024, 6:32 PM · Wikidata, Wikidata-Query-Service
dcausse merged T354142: 502 error on some Lingua Libre federated queries into T299290: Unexpected behavior in federated queries with LinguaLibre in WDQS.
Jan 4 2024, 6:32 PM · Lingua-Libre-Legacy
dcausse edited projects for T354043: Decide the name, domain and logo of WDQS for scholarly articles, added: Wikidata-Query-Service; removed Discovery-Search (Current work).
Jan 4 2024, 3:27 PM · Wikidata-Query-Service, Wikidata

Dec 19 2023

dcausse moved T350465: Load Wikidata split graphs into test servers from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Dec 19 2023, 2:24 PM · Data-Platform-SRE (2023.12.01 - 2023.12.31), Discovery-Search (Current work)
dcausse moved T350465: Load Wikidata split graphs into test servers from Needs Review to Done on the Data-Platform-SRE (2023.12.01 - 2023.12.31) board.

Numbers look correct:

Dec 19 2023, 2:23 PM · Data-Platform-SRE (2023.12.01 - 2023.12.31), Discovery-Search (Current work)
dcausse added a subtask for T350464: Expose SPARQL endpoints with full wikidata data set and with split graph to enable experimentation on federation with a split graph: T352878: Troubleshoot recurring systemd unit failures and availability issues for wdqs1022-24.
Dec 19 2023, 8:44 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse added a parent task for T352878: Troubleshoot recurring systemd unit failures and availability issues for wdqs1022-24: T350464: Expose SPARQL endpoints with full wikidata data set and with split graph to enable experimentation on federation with a split graph.
Dec 19 2023, 8:44 AM · Data-Platform-SRE (2024.01.01 - 2024.01.21)
dcausse renamed T352878: Troubleshoot recurring systemd unit failures and availability issues for wdqs1022-24 from Troubleshoot recurring systemd unit failures for wdqs1022-24 to Troubleshoot recurring systemd unit failures and availability issues for wdqs1022-24.
Dec 19 2023, 8:40 AM · Data-Platform-SRE (2024.01.01 - 2024.01.21)

Dec 18 2023

dcausse created P54490 Calling custom UDF from spark.
Dec 18 2023, 6:31 PM

Dec 15 2023

dcausse updated the task description for T350784: Identify/complete post-migration tasks after rdf-streaming-updater migrates to flink operator.
Dec 15 2023, 5:38 PM · Data-Platform-SRE, Wikidata, Wikidata-Query-Service

Dec 14 2023

dcausse updated the task description for T353460: The consumer job of the SUP does not achieve its expected throughput.
Dec 14 2023, 6:30 PM · Discovery-Search (Current work), CirrusSearch
dcausse added a comment to T353460: The consumer job of the SUP does not achieve its expected throughput.

Bumping envoy resources did help a bit, cpu throttling is reduced (still a bit present tho):

image.png (453×1 px, 125 KB)

Dec 14 2023, 6:02 PM · Discovery-Search (Current work), CirrusSearch
dcausse added a subtask for T317045: [Epic] Re-architect the Search Update Pipeline: T353473: The cirrussearch.update_pipeline.update stream should be keyed by wiki and page_id.
Dec 14 2023, 5:12 PM · Discovery-Search (Current work), Epic
dcausse added a parent task for T353473: The cirrussearch.update_pipeline.update stream should be keyed by wiki and page_id: T317045: [Epic] Re-architect the Search Update Pipeline.
Dec 14 2023, 5:12 PM · Discovery-Search (Current work), CirrusSearch