Page MenuHomePhabricator

dcausse (David Causse)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Jun 9 2015, 9:03 AM (306 w, 23 h)
Availability
Available
IRC Nick
dcausse
LDAP User
DCausse
MediaWiki User
DCausse (WMF) [ Global Accounts ]

Recent Activity

Yesterday

dcausse moved T275068: Get baseline measurements/expectations for splitting lexemes from Wikidata graph from In Progress to Needs review on the Discovery-Search (Current work) board.

percentage, number of WDQS queries per month that involve Lexemes

percentage, number of the above queries that only involve Lexemes (i.e. doesn't require anything from the larger Wikidata graph)

with very naive heuristics and for one day I extracted 529097 queries involving lexemes.
357917 seemed to require data from wikidata but I would not trust this too much. Since the language is a wikidata item a query requesting labels in a language using its language code rather than its QID falls into the category of queries requiring the wikidata graph.
I did not run the analysis on the full month because it's rather slow and given the precision of the heuristics I chose I would not trust these numbers anyways.

Tue, Apr 20, 4:11 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Mon, Apr 19

dcausse closed T280462: bd:sample is not documented in the WDQS manual as Declined.

bd:sample is a blazegraph feature and should be documented on the blazegraph wiki which is referenced from https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual#Blazegraph_extensions.

Mon, Apr 19, 6:48 PM · Documentation, Wikidata-Query-Service, Wikidata
dcausse added a comment to T280538: Capture rev_is_revert event data in a stream different than mediawiki.revision-create.

chronology_id was initially added by Stas for Wikidata query service. Given that the search team is actively working on using flink for query service updater, I'm not sure they will be using chronology_id field. @dcausse can you shed some lite on whether you need the chronology_id field in the events?

Mon, Apr 19, 5:25 PM · Privacy Engineering, Privacy, Event-Platform, Product-Analytics, Analytics
dcausse added a comment to T280382: WDQS hosts low on /srv disk space.

we might want to exclude wdqs1009 from this for now since we do not have anywhere else to get its journal from.

Mon, Apr 19, 3:54 PM · Discovery-Search (Current work)
dcausse added a comment to T94019: Generate RDF from JSON.

Indeed, the RDF data is available in the hive table discovery.wikibase_rdf but it is generated reading the TTL dumps so it might not help for this particular task.
Using hadoop will indeed allow to process the json efficiently but has drawbacks as already pointed out:

  • requires maintaining the Wikibase -> RDF projection in multiple codebases (PHP wikibase & in spark)
  • once created from the hadoop cluster it will have to be pushed back to the labstore machine for public consumption and might add extra delay
Mon, Apr 19, 12:50 PM · wdwb-tech, Wikidata-Campsite, Patch-For-Review, Wikidata
dcausse added a comment to T280482: Validate that OpenSearch is a viable replacement for Elasticsearch for CirrusSearch.

OpenSearch was forked from elastic 7.10 but CirrusSearch only supports elastic 6.x so migrating to 7 (T280482) might be necessary before doing this.

Mon, Apr 19, 8:33 AM · Discovery-Search
dcausse claimed T275068: Get baseline measurements/expectations for splitting lexemes from Wikidata graph.
Mon, Apr 19, 7:49 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse moved T275068: Get baseline measurements/expectations for splitting lexemes from Wikidata graph from Ready for Development to In Progress on the Discovery-Search (Current work) board.
Mon, Apr 19, 7:48 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Fri, Apr 16

dcausse committed rWJVM4a697d2622d5: Initial import (authored by dcausse).
Initial import
Fri, Apr 16, 9:03 AM

Thu, Apr 15

Restricted Application added a project to T275133: Limit query parallelism from Flink based WDQS updater to Wikidata: wdwb-tech.

Since we are going to use envoy to contact MW applications servers I wonder if this kind of limits could be enforced by it?

Thu, Apr 15, 2:43 PM · wdwb-tech, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata, serviceops

Wed, Apr 14

dcausse added a comment to T273098: High Availability Flink.

I do see that using the configmap election method is appealing as it is build in and does not require additional software to function. Unfortunately I was not able to understand (by briefly reading the docs) if this uses a separate configmap or the one that is actually used for configuring flink.
While the former would be okay-ish I guess, the latter will potentially cause problems as every deployment will result in a re-creation of said configmap by helm. Resetting it to whatever state the chart has defined.

Wed, Apr 14, 1:25 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse created T280131: Evaluate existing tools made to assist relevancy work.
Wed, Apr 14, 12:26 PM · CirrusSearch, Discovery-Search

Tue, Apr 13

dcausse moved T231517: Investigate and fix GC issues on cloudelastic machines from Ops / SRE to needs triage on the Discovery-Search board.

moving to needs triage to raise visibility

Tue, Apr 13, 3:59 PM · Patch-For-Review, Discovery-Search
dcausse added a comment to T231517: Investigate and fix GC issues on cloudelastic machines.

Most probably due to the recent reindex T274200. It looks like cloudelastic does not have the capacity to support a reindex of our large indices (commons and wikidata). Worth noting that we are investigating creating dedicated production clusters for these two indices (T265621), should we reconsider the size of the cloudelastic cluster (add even more machines) or perhaps have a dedicated cloudelastic cluster for wikidata&commons?

Tue, Apr 13, 12:24 PM · Patch-For-Review, Discovery-Search

Thu, Apr 8

dcausse updated the task description for T279698: WDQS should retry when getting 404s.
Thu, Apr 8, 6:40 PM · Wikidata, Wikidata-Query-Service
dcausse updated the task description for T279698: WDQS should retry when getting 404s.
Thu, Apr 8, 5:44 PM · Wikidata, Wikidata-Query-Service
dcausse created T279698: WDQS should retry when getting 404s.
Thu, Apr 8, 5:41 PM · Wikidata, Wikidata-Query-Service
dcausse added a comment to T279639: Items sometimes repeat in the Search and Item dropdowns.

Another weird behavior is that you can expand the 7 results without asking for more:

Thu, Apr 8, 12:46 PM · Discovery-Search, Wikidata
dcausse added a comment to T279639: Items sometimes repeat in the Search and Item dropdowns.

@Moebeus thanks for the report, do you know if the duplicates appear after clicking more to display the remaining results or directly?
If they appear directly could you check by scrolling down if all the first 7 results are duplicated?
By default only 7 items are searched and shown, more can be displayed only if you hit the more button.

Thu, Apr 8, 10:32 AM · Discovery-Search, Wikidata
dcausse edited projects for T279639: Items sometimes repeat in the Search and Item dropdowns, added: Discovery-Search; removed Discovery.
Thu, Apr 8, 10:16 AM · Discovery-Search, Wikidata
dcausse updated the task description for T279607: Clean up failed reindexing indexes.
Thu, Apr 8, 7:36 AM · Discovery-Search (Current work)

Wed, Apr 7

dcausse updated the task description for T279541: Add a reconciliation strategy to the wdqs streaming updater.
Wed, Apr 7, 2:33 PM · Wikidata-Query-Service, Wikidata
dcausse added a subtask for T244590: [Epic] Rework the WDQS updater as an event driven application: T279541: Add a reconciliation strategy to the wdqs streaming updater.
Wed, Apr 7, 2:24 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service, Epic
dcausse added a parent task for T279541: Add a reconciliation strategy to the wdqs streaming updater: T244590: [Epic] Rework the WDQS updater as an event driven application.
Wed, Apr 7, 2:24 PM · Wikidata-Query-Service, Wikidata
dcausse updated the task description for T279541: Add a reconciliation strategy to the wdqs streaming updater.
Wed, Apr 7, 2:23 PM · Wikidata-Query-Service, Wikidata
dcausse created T279541: Add a reconciliation strategy to the wdqs streaming updater.
Wed, Apr 7, 2:22 PM · Wikidata-Query-Service, Wikidata
dcausse moved T278209: MediaSearch results not updated 12 hours after overwriting image from In Progress to Needs review on the Discovery-Search (Current work) board.
Wed, Apr 7, 10:02 AM · MW-1.37-notes (1.37.0-wmf.3; 2021-04-27), Discovery-Search (Current work), SDAW-MediaSearch, Structured-Data-Backlog, CirrusSearch
dcausse closed T279500: When searching with incategory on local wiki, results from Commons categories are included as Declined.

When File is part of the searched namespaces on wmf wikis commons is being searched too. The shape of the search query might inhibit this behavior but incategory is explicitly declared as a keyword that is allowed to work on commons. I don't have strong opinions on this but I think this feature has been here for a while so I guess it's OK.
To answer your last statement it's a feature and not a issue nor a side-effect. As a workaround the local: prefix can be used to forcibly ignore results from commons.
I'm declining as this is working as expected but feel free to re-open if you believe this feature should be changed.

Wed, Apr 7, 7:33 AM · Discovery-Search, CirrusSearch

Tue, Apr 6

dcausse awarded T279443: External referrer & WDQS metrics stopped updating on 2021-02-08 a Love token.
Tue, Apr 6, 3:20 PM · Product-Analytics (Kanban)

Thu, Apr 1

dcausse moved T270476: Linked Data Fragments endpoint returns IllegalStateException from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Thu, Apr 1, 7:23 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse claimed T278209: MediaSearch results not updated 12 hours after overwriting image.
Thu, Apr 1, 1:22 PM · MW-1.37-notes (1.37.0-wmf.3; 2021-04-27), Discovery-Search (Current work), SDAW-MediaSearch, Structured-Data-Backlog, CirrusSearch
dcausse moved T270476: Linked Data Fragments endpoint returns IllegalStateException from In Progress to Needs review on the Discovery-Search (Current work) board.
Thu, Apr 1, 1:20 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Wed, Mar 31

dcausse claimed T270476: Linked Data Fragments endpoint returns IllegalStateException.
Wed, Mar 31, 2:58 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse moved T199219: WDQS should use internal endpoint to communicate to Wikidata from Ready for Development to In Progress on the Discovery-Search (Current work) board.
Wed, Mar 31, 2:54 PM · Discovery-Search (Current work), Performance-Team (Radar), Wikidata, Wikidata-Query-Service
dcausse moved T278385: Streaming Updater must make all requests to proxy endpoints from Ready for Development to In Progress on the Discovery-Search (Current work) board.
Wed, Mar 31, 2:48 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Tue, Mar 30

dcausse closed T278693: Manually purge obsolete/outdated entites from WDQS (2021-03) as Resolved.

I read the announcement and I am pretty excited about the improvements. The query-preview servers do not seem to have the problem that I have reported here, but I am not sure right now whether you have reloaded the entities there as well.

Tue, Mar 30, 2:45 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse moved T278693: Manually purge obsolete/outdated entites from WDQS (2021-03) from Ready for Development to Needs Reporting on the Discovery-Search (Current work) board.

@MisterSynergy thanks for the report!

Tue, Mar 30, 8:13 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Mon, Mar 29

dcausse moved T276750: Add means to upgrade the flink code even when incompatible serialization changes are involved from Blocked (from outside the team) to Needs review on the Discovery-Search (Current work) board.
Mon, Mar 29, 3:26 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse added a comment to T274982: Disable fetching constraints from the wdqs updater.

Thanks for bringing this here, this link is generated from https://www.wikidata.org/wiki/Module:Constraints/SPARQL and seems to be added to all properties except the fews that define no constraint.
Digging more through the impact over the 370 queries using wikibase:hasViolationForConstraint for March (1st -> 28th):

Mon, Mar 29, 1:26 PM · Discovery-Search (Current work), Patch-For-Review, Wikidata
dcausse moved T277637: Report latency metric to the wdqs-ui from the wdqs streaming updater from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Mon, Mar 29, 7:37 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse moved T276784: Recover lexemes on wdqs1009 from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Mon, Mar 29, 7:37 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse moved T277637: Report latency metric to the wdqs-ui from the wdqs streaming updater from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Mon, Mar 29, 7:37 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Fri, Mar 26

dcausse added a comment to T262612: Run an A/B test using suggestions generated using glent Method 1.

I think one of the reason bucketing was done in the frontend was to better detect the search session boundaries, doing this on the backend without a state per identity you would have to set arbitrary boundaries I think.

Fri, Mar 26, 8:59 AM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)
dcausse added a comment to T262612: Run an A/B test using suggestions generated using glent Method 1.

Quickly looked when I saw that frwiki has the new search widget enabled but not dewiki/enwiki. Looking at the data it seems frwiki is heavily affected (~20% of the sessions have an event in mismatch or invalid as opposed to 1%/2% for other wikis):

Fri, Mar 26, 8:42 AM · MW-1.36-notes (1.36.0-wmf.37; 2021-03-30), Patch-For-Review, Discovery-Search (Current work)

Thu, Mar 25

dcausse updated the task description for T278385: Streaming Updater must make all requests to proxy endpoints.
Thu, Mar 25, 7:21 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Tue, Mar 23

dcausse moved T277637: Report latency metric to the wdqs-ui from the wdqs streaming updater from in progress to incoming on the Wikidata board.
Tue, Mar 23, 6:14 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse moved T277637: Report latency metric to the wdqs-ui from the wdqs streaming updater from Ready for Development to Needs review on the Discovery-Search (Current work) board.
Tue, Mar 23, 6:13 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse added a comment to T277637: Report latency metric to the wdqs-ui from the wdqs streaming updater.

Suggestion for a better long term solution here: T278246

Tue, Mar 23, 6:13 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse renamed T278246: Report WDQS update latency when displaying/serving results from WDQS latency to Report WDQS update latency when displaying/serving results .
Tue, Mar 23, 5:51 PM · Wikidata, Wikidata-Query-Service
dcausse created T278246: Report WDQS update latency when displaying/serving results .
Tue, Mar 23, 4:37 PM · Wikidata, Wikidata-Query-Service
dcausse created P15032 updater looping.
Tue, Mar 23, 10:23 AM
dcausse claimed T277637: Report latency metric to the wdqs-ui from the wdqs streaming updater.
Tue, Mar 23, 7:53 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse moved T276750: Add means to upgrade the flink code even when incompatible serialization changes are involved from In Progress to Blocked (from outside the team) on the Discovery-Search (Current work) board.
Tue, Mar 23, 7:52 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Mon, Mar 22

dcausse added a subtask for T268648: [EPIC] MediaSearch should use a dedicated service/query for doing its concept-lookup instead of the wikidata search API: T278155: Create commons/wikidata dataset for MediaSearch.
Mon, Mar 22, 5:08 PM · Epic, Structured-Data-Backlog, SDAW-MediaSearch (MediaSearch-ReleaseCandidate2), Discovery-Search, CirrusSearch
dcausse added a parent task for T278155: Create commons/wikidata dataset for MediaSearch: T268648: [EPIC] MediaSearch should use a dedicated service/query for doing its concept-lookup instead of the wikidata search API.
Mon, Mar 22, 5:08 PM · Discovery-Search (Current work), CirrusSearch
dcausse created T278155: Create commons/wikidata dataset for MediaSearch.
Mon, Mar 22, 5:07 PM · Discovery-Search (Current work), CirrusSearch
dcausse removed a project from T277691: Argument 1 passed to DataValues\Geo\Values\LatLongValue::__construct() must be of the type float, string given: GeoData.
Mon, Mar 22, 3:15 PM · Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), wdwb-tech, User-brennen, Wikidata, Wikimedia-production-error

Mar 19 2021

dcausse updated the task description for T277665: [L] Determine an IRI to join commons mediainfo entities and wikidata properties referencing commons images.
Mar 19 2021, 3:32 PM · Structured-Data-Backlog (Current Work), WikibaseMediaInfo
dcausse added a comment to T274354: rdf munger and hence wdqs-updater requires siteLinks to be formed using a specific articlePath.

@despens could you provide a reproducible test case (a small RDF file that triggers the problem would be great). I don't see how site links could be involved in the problem you raise and a test case will definitely help. Thanks!

Mar 19 2021, 3:21 PM · Wikidata-Query-Service, wdwb-tech, Wikidata, Wikibase
dcausse moved T263427: Unable to process a particular wikibase dump using munge.sh (localised namespace name) from Small Bugs to All WDQS-related tasks on the Wikidata-Query-Service board.
Mar 19 2021, 3:06 PM · wdwb-tech, Wikidata-Query-Service, Wikidata, User-Nikerabbit, Wikibase-Containers
dcausse added a comment to T263427: Unable to process a particular wikibase dump using munge.sh (localised namespace name).

I see two ways to fix this:

Mar 19 2021, 3:05 PM · wdwb-tech, Wikidata-Query-Service, Wikidata, User-Nikerabbit, Wikibase-Containers

Mar 17 2021

dcausse updated the task description for T277665: [L] Determine an IRI to join commons mediainfo entities and wikidata properties referencing commons images.
Mar 17 2021, 3:02 PM · Structured-Data-Backlog (Current Work), WikibaseMediaInfo
dcausse added a comment to T258776: Add Structured Data on Commons M-ID to Wikidata dumps.

Note: in an attempt to unblock the status quo I created T277665 with some practical solution (esp the first one suggested in T258769#6332430)

Mar 17 2021, 3:00 PM · StructuredDataOnCommons, Wikidata-Query-Service, Wikidata
dcausse added a comment to T258769: ImageGrid for WCQS.

Note: in an attempt to unblock the status quo I created T277665 with some practical solution (esp the first one suggested in T258769#6332430)

Mar 17 2021, 2:59 PM · Commons, Wikidata-Query-Service, Wikidata
dcausse created T277665: [L] Determine an IRI to join commons mediainfo entities and wikidata properties referencing commons images.
Mar 17 2021, 2:56 PM · Structured-Data-Backlog (Current Work), WikibaseMediaInfo
dcausse created T277637: Report latency metric to the wdqs-ui from the wdqs streaming updater.
Mar 17 2021, 8:40 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Mar 16 2021

dcausse created T277565: Misleading markup placements when querying items located around the 180th meridian.
Mar 16 2021, 4:43 PM · Wikidata-Campsite, Wikidata, Wikidata-Query-Service
dcausse added a comment to T265914: Investigate Resource Needs for Commons and Wikidata Elasticsearch indices .

Sounds good to me, given the unpredictable growth of commons (current coverage of captions&depict statement is below 10%) I think it's sane to have at least 12 machines.

Mar 16 2021, 9:35 AM · Discovery-Search (Current work)
dcausse merged task T277108: Query service throws exception for non-English wikis into T263427: Unable to process a particular wikibase dump using munge.sh (localised namespace name).
Mar 16 2021, 8:11 AM · Wikidata, Wikibase, Wikidata-Query-Service
dcausse merged T277108: Query service throws exception for non-English wikis into T263427: Unable to process a particular wikibase dump using munge.sh (localised namespace name).
Mar 16 2021, 8:11 AM · wdwb-tech, Wikidata-Query-Service, Wikidata, User-Nikerabbit, Wikibase-Containers
dcausse added a comment to T277108: Query service throws exception for non-English wikis.

Tentatively closing as a duplicate of T263427 as this sounds very similar, please re-open if you think it's completely different or if the workaround mentioned there does not work for you.

Mar 16 2021, 8:10 AM · Wikidata, Wikibase, Wikidata-Query-Service

Mar 15 2021

dcausse added a comment to T269493: Add hasrecommendation: search keyword.

For reference BC patch by Erik: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/671214

Mar 15 2021, 3:47 PM · MW-1.36-notes (1.36.0-wmf.36; 2021-03-23), Growth-Team (Current Sprint), Add-Link, Image-Recommendations, Discovery-Search (Current work), CirrusSearch
dcausse added a subtask for T244590: [Epic] Rework the WDQS updater as an event driven application: T277443: The streaming updater consumer should log information when divergences are detected.
Mar 15 2021, 9:30 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service, Epic
dcausse added a parent task for T277443: The streaming updater consumer should log information when divergences are detected: T244590: [Epic] Rework the WDQS updater as an event driven application.
Mar 15 2021, 9:30 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse created T277443: The streaming updater consumer should log information when divergences are detected.
Mar 15 2021, 9:22 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Mar 12 2021

dcausse moved T276571: selenium-daily-beta-CirrusSearch is failing from Incoming to Needs review on the Discovery-Search (Current work) board.
Mar 12 2021, 8:45 AM · MW-1.36-notes (1.36.0-wmf.35; 2021-03-16), Discovery-Search (Current work), CirrusSearch
dcausse added a comment to T276571: selenium-daily-beta-CirrusSearch is failing.

Thanks @Jdrewniak!
I think I'll go with ?useskinversion=1 for now and wait for the new widget to become the default to switch to it in the test code.

Mar 12 2021, 8:23 AM · MW-1.36-notes (1.36.0-wmf.35; 2021-03-16), Discovery-Search (Current work), CirrusSearch
dcausse added a project to T273266: [L] Commons mediainfo dumps retention: Structured-Data-Backlog.

Pinging SD folks as they've worked on this dump IIRC.

Mar 12 2021, 8:10 AM · Structured-Data-Backlog (Current Work), WikibaseMediaInfo, Dumps-Generation

Mar 9 2021

dcausse moved T276784: Recover lexemes on wdqs1009 from In Progress to Needs review on the Discovery-Search (Current work) board.

Reprocessed all updates related to lexemes on wdqs1009 using a custom build with https://gerrit.wikimedia.org/r/670090

Mar 9 2021, 1:42 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse added a comment to T276827: Apply search query length limit exemption to negated queries.

This makes perfect sense and I think this can be considered a bug. I think that the code simply ignores that the keyword node (in the AST) can be wrapped inside a NegatedNode and will incorrectly skip negated keywords.

Mar 9 2021, 9:46 AM · MW-1.36-notes (1.36.0-wmf.35; 2021-03-16), Growth-Team (Current Sprint), Discovery-Search, CirrusSearch

Mar 8 2021

dcausse created T276784: Recover lexemes on wdqs1009.
Mar 8 2021, 2:05 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse moved T276750: Add means to upgrade the flink code even when incompatible serialization changes are involved from Incoming to In Progress on the Discovery-Search (Current work) board.
Mar 8 2021, 9:13 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse added a subtask for T244590: [Epic] Rework the WDQS updater as an event driven application: T276750: Add means to upgrade the flink code even when incompatible serialization changes are involved.
Mar 8 2021, 9:09 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service, Epic
dcausse added a parent task for T276750: Add means to upgrade the flink code even when incompatible serialization changes are involved: T244590: [Epic] Rework the WDQS updater as an event driven application.
Mar 8 2021, 9:09 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse claimed T276750: Add means to upgrade the flink code even when incompatible serialization changes are involved.
Mar 8 2021, 9:09 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse created T276750: Add means to upgrade the flink code even when incompatible serialization changes are involved.
Mar 8 2021, 8:58 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Mar 5 2021

dcausse added a comment to T276571: selenium-daily-beta-CirrusSearch is failing.

For info the test started to fail around Feb 12 (https://integration.wikimedia.org/ci/view/Selenium/job/selenium-daily-beta-CirrusSearch/)

Mar 5 2021, 2:41 PM · MW-1.36-notes (1.36.0-wmf.35; 2021-03-16), Discovery-Search (Current work), CirrusSearch
dcausse updated subscribers of T276571: selenium-daily-beta-CirrusSearch is failing.

@Jdrewniak would you be aware of a change that might have changed the way autocomplete suggestions are displayed on https://en.wikipedia.beta.wmflabs.org/ ?

Mar 5 2021, 2:28 PM · MW-1.36-notes (1.36.0-wmf.35; 2021-03-16), Discovery-Search (Current work), CirrusSearch
dcausse created T276571: selenium-daily-beta-CirrusSearch is failing.
Mar 5 2021, 2:25 PM · MW-1.36-notes (1.36.0-wmf.35; 2021-03-16), Discovery-Search (Current work), CirrusSearch
dcausse added a comment to T265914: Investigate Resource Needs for Commons and Wikidata Elasticsearch indices .

I think that the estimate at 20TB is not enough to support the current shape of the indices, mainly because we have a giant index at 11Tb. Assuming worst case scenario (reindexing commonswiki_file will require another 11Tb) 33Tb is what is needed to reindex commonswiki_file without reaching the 75% watermark.
So adding +4 machines will increase the overall usable disk space to 33.6Tb so I would suggest at least 14 machines to comfortably support the current sizes. Now should we assume that the expected decrease of commonswiki_file due to file_text truncation and my over-pessimistic reindex scenario will compensate future growth? Hard to say.

Mar 5 2021, 8:22 AM · Discovery-Search (Current work)

Mar 4 2021

dcausse added a comment to T267971: Analyze Speaker-Reviewed M2 Data for Chinese.

Great write-up thanks!

Mar 4 2021, 5:19 PM · Discovery-Search (Current work), Chinese-Sites

Mar 3 2021

dcausse added a comment to T244341: Stop using blank nodes for encoding SomeValue and OWL constraints in WDQS.

It’s probably worth mentioning in that documentation that this change applies not just to the query service but also to the RDF dumps and Special:EntityData. Otherwise, it looks good to me :)

Mar 3 2021, 1:19 PM · Community-consensus-needed, Wikidata-Query-Service, Wikidata
dcausse triaged T276198: /var/run/elasticsearch deleted by elasticsearch as High priority.

Triaging to high as this can cause serious problems.
The cause seems to be in elastic itself but I could not spot the exact problem looking at the elastic code. We might want to workaround the issue by always running systemd-tmpfiles --create from the elasticsearch systemd unit to make sure the folder exists when it's needed.

Mar 3 2021, 12:13 PM · Discovery-Search (Current work), SRE
dcausse closed T275975: Search broken on beta cluster wikis, a subtask of T274204: Deploy new version of Extra Plugin (with Khmer filter) to Elasticsearch cluster, as Resolved.
Mar 3 2021, 8:01 AM · Discovery-Search (Current work)
dcausse closed T275975: Search broken on beta cluster wikis, a subtask of T276038: cirrussearch-backend-error at beta cluster, as Resolved.
Mar 3 2021, 8:01 AM · Pywikibot, Upstream, Pywikibot-tests
dcausse closed T275975: Search broken on beta cluster wikis as Resolved.

Search is functional again.

Mar 3 2021, 8:01 AM · Discovery-Search (Current work), Beta-Cluster-Infrastructure

Mar 2 2021

dcausse added a subtask for T274204: Deploy new version of Extra Plugin (with Khmer filter) to Elasticsearch cluster: T275975: Search broken on beta cluster wikis.
Mar 2 2021, 4:45 PM · Discovery-Search (Current work)
dcausse added a parent task for T275975: Search broken on beta cluster wikis: T274204: Deploy new version of Extra Plugin (with Khmer filter) to Elasticsearch cluster.
Mar 2 2021, 4:45 PM · Discovery-Search (Current work), Beta-Cluster-Infrastructure
dcausse added a comment to T275975: Search broken on beta cluster wikis.

It's related to T274204

Mar 2 2021, 4:45 PM · Discovery-Search (Current work), Beta-Cluster-Infrastructure
dcausse added a project to T276198: /var/run/elasticsearch deleted by elasticsearch: Discovery-Search (Current work).
Mar 2 2021, 3:39 PM · Discovery-Search (Current work), SRE