Page MenuHomePhabricator

dcausse (David Causse)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Jun 9 2015, 9:03 AM (245 w, 55 m)
Availability
Available
IRC Nick
dcausse
LDAP User
DCausse
MediaWiki User
DCausse (WMF) [ Global Accounts ]

Recent Activity

Today

dcausse added a comment to T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints.

I haven't checked but I hope that at most one blank node can be attached to the same subject/predicate, if not this makes the sync algorithm a bit more complex.

At least currently, this is not the case. I added a second “partner: unknown value” statement to the sandbox item, and now wd:Q4115189 wdt:P451 ?v produces two blank nodes as result.

Tue, Feb 18, 8:36 AM · Wikidata-Query-Service, Wikidata

Yesterday

dcausse added a comment to T196165: Commons image: when pasting the exact title, get the correct file first in the suggester.

I believe that because the file name has many words the score on the tokenized text fields is very high (since we sum all token scores), the score on the exact match having only one word and despite having a high weight it's not enough to compete with the loss of its text matches discarded because of the negation.

Mon, Feb 17, 2:41 PM · MW-1.35-notes (1.35.0-wmf.20; 2020-02-18), Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Wikidata
dcausse added a project to T203397: Provide more useful redirect for statement nodes (wds:…): Discovery-Search (Current work).

@Lea_Lacroix_WMDE no, we just need to deploy it, sorry for the delay.

Mon, Feb 17, 1:52 PM · Discovery-Search (Current work), Patch-For-Review, Wikidata
dcausse added a comment to T243419: Quotation marks mask strings from internal search thus excluding them from search results.

I'd consider this a bug indeed, I suspect the tokenization algorithm of the default search backend to be quite limited by not being able to properly discard punctuation.

Mon, Feb 17, 1:46 PM · MediaWiki-Search, Discovery-Search
dcausse renamed T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints from Wikibase RDF dump: stop using blank nodes for encoding unknown values and OWL constraints to Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints.
Mon, Feb 17, 1:29 PM · Wikidata-Query-Service, Wikidata
dcausse added a comment to T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints.

Thanks for all the feedback.
I'll discard the "constant" option.

Mon, Feb 17, 1:14 PM · Wikidata-Query-Service, Wikidata

Fri, Feb 7

dcausse created T244590: EPIC: Rework the WDQS updater as an event driven application.
Fri, Feb 7, 5:51 PM · Wikidata, Wikidata-Query-Service, Epic
dcausse updated the task description for T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints.
Fri, Feb 7, 2:11 PM · Wikidata-Query-Service, Wikidata
dcausse updated the task description for T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints.
Fri, Feb 7, 1:56 PM · Wikidata-Query-Service, Wikidata
dcausse updated the task description for T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints.
Fri, Feb 7, 1:55 PM · Wikidata-Query-Service, Wikidata

Thu, Feb 6

dcausse added a comment to T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints.

Yes the issue with blank nodes is that they are not "reference-able" and thus point delete queries are impossible which is what we want to achieve with the next gen updater.

Thu, Feb 6, 4:47 PM · Wikidata-Query-Service, Wikidata
dcausse triaged T221709: scap service restarts for WDQS are inconsistent as High priority.
Thu, Feb 6, 9:11 AM · Wikidata, Scap, Wikidata-Query-Service

Wed, Feb 5

dcausse added a comment to T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints.

If the problem is just the blank nodes themselves, why not use this new wdunk:P2 in the same way, as in wd:Q3 wdt:P2 wdunk:P2? That’s still worse than the blank nodes (multiple “unknown value” statements collapse into one triple, just as is currently the case for “no value” statements), but at least it shouldn’t break as many queries.

Wed, Feb 5, 3:22 PM · Wikidata-Query-Service, Wikidata
dcausse created T244341: Wikibase RDF dump: stop using blank nodes for encoding SomeValue and OWL constraints.
Wed, Feb 5, 10:33 AM · Wikidata-Query-Service, Wikidata

Thu, Jan 23

dcausse awarded T221774: Add Wikidata query service lag to Wikidata maxlag a Love token.
Thu, Jan 23, 1:44 PM · MW-1.35-notes (1.35.0-wmf.8; 2019-11-26), User-Addshore, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), MW-1.34-notes (1.34.0-wmf.21; 2019-09-03), Patch-For-Review, observability, Wikidata-Query-Service, Wikidata
dcausse renamed T243496: Add mstyles to the gerrit group wikidata-query from Add mstyles to the gerrit group wikidata/query to Add mstyles to the gerrit group wikidata-query.
Thu, Jan 23, 10:18 AM · Discovery-Search (Current work), Release-Engineering-Team, Gerrit-Privilege-Requests
dcausse renamed T243431: Grant more rights to wikidata/query/rdf for the group wikidata-query (similar to search) from Grant more rights to wikidata/query/rdf for the group wikidata/query (similar to search) to Grant more rights to wikidata/query/rdf for the group wikidata-query (similar to search).
Thu, Jan 23, 10:17 AM · Patch-For-Review, Release-Engineering-Team, Wikidata, Gerrit-Privilege-Requests, Wikidata-Query-Service
dcausse moved T243496: Add mstyles to the gerrit group wikidata-query from In Progress to Waiting on the Discovery-Search (Current work) board.
Thu, Jan 23, 10:11 AM · Discovery-Search (Current work), Release-Engineering-Team, Gerrit-Privilege-Requests
dcausse created T243496: Add mstyles to the gerrit group wikidata-query.
Thu, Jan 23, 10:11 AM · Discovery-Search (Current work), Release-Engineering-Team, Gerrit-Privilege-Requests
dcausse added a project to T243431: Grant more rights to wikidata/query/rdf for the group wikidata-query (similar to search): Release-Engineering-Team.
Thu, Jan 23, 10:08 AM · Patch-For-Review, Release-Engineering-Team, Wikidata, Gerrit-Privilege-Requests, Wikidata-Query-Service

Wed, Jan 22

dcausse created T243431: Grant more rights to wikidata/query/rdf for the group wikidata-query (similar to search).
Wed, Jan 22, 5:48 PM · Patch-For-Review, Release-Engineering-Team, Wikidata, Gerrit-Privilege-Requests, Wikidata-Query-Service

Tue, Jan 21

dcausse created T243292: Fix the munger to support commons RDF dump.
Tue, Jan 21, 3:06 PM · Wikidata-Query-Service, Wikidata
dcausse added projects to T243270: Test commons RDF dumps on sdcquery.wmflabs.org: Wikidata-Query-Service, Discovery-Search (Current work).
Tue, Jan 21, 10:11 AM · Wikidata, Discovery-Search (Current work), Wikidata-Query-Service
dcausse created T243270: Test commons RDF dumps on sdcquery.wmflabs.org.
Tue, Jan 21, 10:11 AM · Wikidata, Discovery-Search (Current work), Wikidata-Query-Service

Jan 17 2020

dcausse added a comment to T243061: Gerrit fails with: Internal server error.

The patch was just merged, I wonder if it's not because of the submodule and trying to detect conflicts with another patch that touches this deleted module (https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/564063).
There might be some logs server-side?

Jan 17 2020, 2:24 PM · Release-Engineering-Team, Gerrit
dcausse renamed T243061: Gerrit fails with: Internal server error from Internal server error gerrit to Gerrit fails with: Internal server error.
Jan 17 2020, 10:35 AM · Release-Engineering-Team, Gerrit
dcausse created T243061: Gerrit fails with: Internal server error.
Jan 17 2020, 10:34 AM · Release-Engineering-Team, Gerrit

Jan 16 2020

dcausse added a comment to T242453: wdqs1005 stopped to handle updates.

icinga check showed: CHECK_NRPE STATE UNKNOWN: Socket timeout after 10 seconds. for Query Service HTTP Port and NaN for WDQS high update lag.

Jan 16 2020, 5:26 PM · Wikidata, Wikidata-Query-Service
dcausse created P10185 blazegraph stuck on wdqs1007.
Jan 16 2020, 5:26 PM
dcausse added a comment to T223046: Lack of case sensitivity with hastemplate:.

I suppose that the last remark refers to the $wgCapitalLinks and
$wgCapitalLinkOverrides configuration variables.
When querying cirrus properly honors these parameters in a way that searching for hastemplate:foo will actually search for Template:Foo on english wikipedia but Template:foo on english wiktionary.

Jan 16 2020, 9:43 AM · MW-1.35-notes (1.35.0-wmf.19; 2020-02-11), Patch-For-Review, MediaWiki-Search, Discovery-Search (Current work)
dcausse updated subscribers of T240559: Expose ORES drafttopic data in ElasticSearch via a custom CirrusSearch keyword.

Indeed, the only keyword that will do some filtering but also affect ranking is morelike but not sure we can base any naming pattern on it. about-topic: sounds fine to me (@TJones might have some suggestions perhaps?).

Jan 16 2020, 8:54 AM · Growth-Team (Current Sprint), Scoring-platform-team, MediaWiki-extensions-ORES, Discovery-Search, NewcomerTasks 1.1

Jan 13 2020

dcausse added a comment to T242640: query/wikidata/gui jenkins build broken.

very similar to T242587

Jan 13 2020, 5:02 PM · Wikidata Query UI, Patch-For-Review, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), User-Addshore, Wikidata
dcausse triaged T242640: query/wikidata/gui jenkins build broken as High priority.
Jan 13 2020, 4:59 PM · Wikidata Query UI, Patch-For-Review, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), User-Addshore, Wikidata
dcausse created T242640: query/wikidata/gui jenkins build broken.
Jan 13 2020, 4:55 PM · Wikidata Query UI, Patch-For-Review, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), User-Addshore, Wikidata
dcausse created T242624: Grant access to archiva-deployers for mstyles.
Jan 13 2020, 3:35 PM · Analytics, Operations
dcausse updated subscribers of T242622: Grant access to archiva-deployers for zpapierski.
Jan 13 2020, 3:32 PM · Analytics, Operations
dcausse created T242622: Grant access to archiva-deployers for zpapierski.
Jan 13 2020, 3:31 PM · Analytics, Operations
dcausse moved T236296: Create new message for overwritten queries when the original query has zero results from To Be Deployed to Done on the Discovery-Search (Current work) board.
Jan 13 2020, 10:55 AM · MW-1.35-notes (1.35.0-wmf.14; 2020-01-07), Discovery-Search (Current work)
dcausse moved T241948: Search giving inappropriate "Showing results for ... Search instead for ...." from To Be Deployed to Done on the Discovery-Search (Current work) board.
Jan 13 2020, 10:55 AM · Discovery-Search (Current work), CirrusSearch

Jan 11 2020

dcausse added a comment to T240559: Expose ORES drafttopic data in ElasticSearch via a custom CirrusSearch keyword.

Perhaps prefer-topic:something then?
My concern here is mostly to avoid existing words in the special syntax to avoid swallowing queries that are valid sentences. For instance when I copy/paste a text and search for it, e.g. searching for Special topic: Electric aircraft I probably don't mean the keyword.

Jan 11 2020, 10:37 AM · Growth-Team (Current Sprint), Scoring-platform-team, MediaWiki-extensions-ORES, Discovery-Search, NewcomerTasks 1.1

Jan 10 2020

dcausse created T242453: wdqs1005 stopped to handle updates.
Jan 10 2020, 7:56 PM · Wikidata, Wikidata-Query-Service
dcausse created P10117 blazegraph stuck on wdqs1005.
Jan 10 2020, 7:50 PM

Jan 8 2020

dcausse committed rDPOM4cbe35a0b703: [maven-release-plugin] prepare for next development iteration (authored by dcausse).
[maven-release-plugin] prepare for next development iteration
Jan 8 2020, 2:57 PM
dcausse committed rDPOMeb342fc04068: [maven-release-plugin] prepare release discovery-parent-pom-1.31 (authored by dcausse).
[maven-release-plugin] prepare release discovery-parent-pom-1.31
Jan 8 2020, 2:57 PM
dcausse committed rDMTC584af9dda6cb: [maven-release-plugin] prepare for next development iteration (authored by dcausse).
[maven-release-plugin] prepare for next development iteration
Jan 8 2020, 12:57 PM
dcausse committed rDMTCc1b789587cad: [maven-release-plugin] prepare release discovery-maven-tool-configs-1.11 (authored by dcausse).
[maven-release-plugin] prepare release discovery-maven-tool-configs-1.11
Jan 8 2020, 12:56 PM

Jan 7 2020

dcausse updated the title for P10073 mjolnir scala violation from m to mjolnir scala violation.
Jan 7 2020, 7:04 PM
dcausse created P10073 mjolnir scala violation.
Jan 7 2020, 7:04 PM
dcausse closed T241487: deployment-mediawiki-07: Search backend error during {queryType} search for '{query}' after {tookMs}: {error_message} as Resolved.
Jan 7 2020, 8:43 AM · Discovery-Search (Current work), CirrusSearch, Beta-Cluster-Infrastructure
dcausse moved T241487: deployment-mediawiki-07: Search backend error during {queryType} search for '{query}' after {tookMs}: {error_message} from Needs review to Done on the Discovery-Search (Current work) board.

Search was broken, the config change fixed it.
logstash-beta seems to have stopped to receive events since Jan 1st 16:40 so I can't be really sure that the logspam stopped. Please reopen if you still see errors of this kind.

Jan 7 2020, 8:43 AM · Discovery-Search (Current work), CirrusSearch, Beta-Cluster-Infrastructure

Jan 6 2020

dcausse updated the task description for T241969: Wrong suggestions shown by glent M0.
Jan 6 2020, 8:50 AM · Discovery-Search, CirrusSearch
dcausse moved T241948: Search giving inappropriate "Showing results for ... Search instead for ...." from In Progress to To Be Deployed on the Discovery-Search (Current work) board.
Jan 6 2020, 8:47 AM · Discovery-Search (Current work), CirrusSearch
dcausse moved T241948: Search giving inappropriate "Showing results for ... Search instead for ...." from needs triage to Current work on the Discovery-Search board.
Jan 6 2020, 8:47 AM · Discovery-Search (Current work), CirrusSearch
dcausse edited projects for T241969: Wrong suggestions shown by glent M0, added: Discovery-Search; removed Discovery-Search (Current work).
Jan 6 2020, 8:47 AM · Discovery-Search, CirrusSearch
dcausse moved T241969: Wrong suggestions shown by glent M0 from In Progress to To Be Deployed on the Discovery-Search (Current work) board.
Jan 6 2020, 8:46 AM · Discovery-Search, CirrusSearch
dcausse moved T241969: Wrong suggestions shown by glent M0 from needs triage to Current work on the Discovery-Search board.
Jan 6 2020, 8:46 AM · Discovery-Search, CirrusSearch
dcausse triaged T241948: Search giving inappropriate "Showing results for ... Search instead for ...." as High priority.
Jan 6 2020, 8:46 AM · Discovery-Search (Current work), CirrusSearch
dcausse added a comment to T241948: Search giving inappropriate "Showing results for ... Search instead for ....".

Most likely already fixed in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/554603 but not yet deployed (should be deployed during this week train).
For 2020 primaries vs 2016 primaries there seem to be an additional problem (the user query should not be corrected in this case), filing a new task for this (please see T241969).

Jan 6 2020, 8:45 AM · Discovery-Search (Current work), CirrusSearch
dcausse created T241969: Wrong suggestions shown by glent M0.
Jan 6 2020, 8:43 AM · Discovery-Search, CirrusSearch
dcausse updated the task description for T241128: EPIC: Reduce the time needed to do the initial WDQS import.
Jan 6 2020, 7:56 AM · Wikidata-Query-Service, Wikidata

Jan 3 2020

dcausse added a comment to T241265: Search find a section name but not a page name.

Indeed! =)
In fact, I didn’t expect the stemmer achieved to look for “venu” instead of “venir”, but I would think CirrusSearch would try to remove the “venir” keyword to find articles where all other words are in the title.
(Before your explanation, I believed it already did it for finding Sheila article; so I found that bahavior inconsistent.)

Jan 3 2020, 2:10 PM · CirrusSearch, Discovery-Search, Discovery
dcausse added a comment to T241265: Search find a section name but not a page name.

One problem is that the french stemmer does not conflate venir with its conjugated form venu or venue.
The page Je suis venu te dire que je m'en vais does not have venir meaning that it cannot match the query Je suis venir te dire que je m'en vais.

That make sense, but how does it achieve to find “Sheila (section Je suis venue te dire que je m'en vais (1989))“ as first result?

Jan 3 2020, 8:29 AM · CirrusSearch, Discovery-Search, Discovery

Jan 2 2020

dcausse placed T238130: Create a ES wildcard/prefix/fuzzy query that supports normalization and max_determinized_states (extra plugin) up for grabs.
Jan 2 2020, 3:05 PM · CirrusSearch, Discovery-Search
dcausse triaged T219405: Mjolnir needs to respect cirrussearch frozen indices flag as Medium priority.
Jan 2 2020, 2:35 PM · Discovery-Search
dcausse triaged T240453: EPIC: Improve completion search on wikidata as Medium priority.
Jan 2 2020, 2:34 PM · Wikidata, Epic, CirrusSearch, Discovery-Search
dcausse moved T240453: EPIC: Improve completion search on wikidata from needs triage to Wikidata Search on the Discovery-Search board.
Jan 2 2020, 2:34 PM · Wikidata, Epic, CirrusSearch, Discovery-Search
dcausse moved T240778: "We could not complete your search due to a temporary problem." searching words on Minangkabau Wiktionary from needs triage to Current work on the Discovery-Search board.
Jan 2 2020, 2:30 PM · Discovery-Search (Current work), Wikimedia-General-or-Unknown
dcausse triaged T241265: Search find a section name but not a page name as Medium priority.

One problem is that the french stemmer does not conflate venir with its conjugated form venu or venue.
The page Je suis venu te dire que je m'en vais does not have venir meaning that it cannot match the query Je suis venir te dire que je m'en vais.

Jan 2 2020, 2:21 PM · CirrusSearch, Discovery-Search, Discovery
dcausse moved T241437: Restore descriptions in opensearch API from needs triage to watching / waiting on the Discovery-Search board.
Jan 2 2020, 1:56 PM · MediaWiki-Search, Discovery-Search
dcausse moved T241421: Sustained periods (2-4h) of bad latency on production-search eqiad from In Progress to Waiting on the Discovery-Search (Current work) board.
Jan 2 2020, 1:44 PM · Discovery-Search (Current work), Patch-For-Review, Operations, Traffic, Performance Issue, Elasticsearch
dcausse moved T241421: Sustained periods (2-4h) of bad latency on production-search eqiad from needs triage to Current work on the Discovery-Search board.
Jan 2 2020, 1:44 PM · Discovery-Search (Current work), Patch-For-Review, Operations, Traffic, Performance Issue, Elasticsearch
dcausse edited projects for T241485: [_field_stats] endpoint is deprecated! Use [_field_caps] instead or run a min/max aggregations on the desired fields., added: Wikimedia-Logstash; removed Elasticsearch, Discovery-Search.
Jan 2 2020, 1:43 PM · Wikimedia-Logstash, Beta-Cluster-Infrastructure
dcausse triaged T241487: deployment-mediawiki-07: Search backend error during {queryType} search for '{query}' after {tookMs}: {error_message} as Medium priority.
Jan 2 2020, 11:32 AM · Discovery-Search (Current work), CirrusSearch, Beta-Cluster-Infrastructure
dcausse claimed T241487: deployment-mediawiki-07: Search backend error during {queryType} search for '{query}' after {tookMs}: {error_message}.
Jan 2 2020, 11:32 AM · Discovery-Search (Current work), CirrusSearch, Beta-Cluster-Infrastructure
dcausse moved T241487: deployment-mediawiki-07: Search backend error during {queryType} search for '{query}' after {tookMs}: {error_message} from needs triage to Current work on the Discovery-Search board.
Jan 2 2020, 11:25 AM · Discovery-Search (Current work), CirrusSearch, Beta-Cluster-Infrastructure
dcausse moved T241582: GeoData needs use of global $wgUser replaced from In Progress to Needs review on the Discovery-Search (Current work) board.
Jan 2 2020, 10:16 AM · MW-1.35-notes (1.35.0-wmf.16; 2020-01-21), Discovery-Search (Current work), GeoData, User-DannyS712
dcausse moved T241582: GeoData needs use of global $wgUser replaced from needs triage to Current work on the Discovery-Search board.
Jan 2 2020, 10:12 AM · MW-1.35-notes (1.35.0-wmf.16; 2020-01-21), Discovery-Search (Current work), GeoData, User-DannyS712
dcausse renamed T219405: Mjolnir needs to respect cirrussearch frozen indices flag from Mjolnir needs to resepect cirrussearch frozen indices flag to Mjolnir needs to respect cirrussearch frozen indices flag.
Jan 2 2020, 10:01 AM · Discovery-Search
dcausse added a comment to T240559: Expose ORES drafttopic data in ElasticSearch via a custom CirrusSearch keyword.

I suggest a keyword slightly less ambiguous such as hastopic or hasdrafttopic.
I agree that there should be a mapping, if this keyword is going to be used directly by users it might be helpful to allow them to search a topic translated into the wiki language instead of using English.

Jan 2 2020, 9:40 AM · Growth-Team (Current Sprint), Scoring-platform-team, MediaWiki-extensions-ORES, Discovery-Search, NewcomerTasks 1.1

Dec 31 2019

dcausse added a project to T241421: Sustained periods (2-4h) of bad latency on production-search eqiad: Traffic.

I believe this is caused by a bot sending a large amount of requests of type:
/w/api.php?format=json&action=query&prop=revisions&list=search&srsearch=search+query
using the UA: wikipedia (https://github.com/goldsmith/Wikipedia/)

Dec 31 2019, 5:15 PM · Discovery-Search (Current work), Patch-For-Review, Operations, Traffic, Performance Issue, Elasticsearch

Dec 20 2019

dcausse moved T237285: Convert the SearchQuery and its AST into an elasticsearch query from In Progress to Waiting on the Discovery-Search (Current work) board.
Dec 20 2019, 12:57 PM · Patch-For-Review, Discovery-Search (Current work), CirrusSearch
dcausse added a comment to T241213: Organize and improve integration test coverage for WDQS Updater.

The most annoying integration test (and probably slowest) is org.wikidata.query.rdf.tool.wikibase.WikibaseRepositoryIntegrationTest:

  • it generates anonymous edits to test.wikidata.org in order to test the RecentChange api
  • Concurrent runs of this test will cause failure. The test expects to see the timestamp of the edits it makes, if this test is run concurrently (two patches in CI) it's a race and can fail.
  • it adds a lot of complexity to test the robustness (retries) by launching a custom Proxy prior running the integration tests (start-proxy and org.wikidata.query.rdf.tool.Proxy)
Dec 20 2019, 9:33 AM · Test-Coverage, Wikidata, Wikidata-Query-Service

Dec 19 2019

dcausse updated the task description for T241128: EPIC: Reduce the time needed to do the initial WDQS import.
Dec 19 2019, 10:46 AM · Wikidata-Query-Service, Wikidata
dcausse updated the task description for T241128: EPIC: Reduce the time needed to do the initial WDQS import.
Dec 19 2019, 10:45 AM · Wikidata-Query-Service, Wikidata
dcausse updated the task description for T241128: EPIC: Reduce the time needed to do the initial WDQS import.
Dec 19 2019, 10:43 AM · Wikidata-Query-Service, Wikidata
dcausse created T241128: EPIC: Reduce the time needed to do the initial WDQS import.
Dec 19 2019, 10:32 AM · Wikidata-Query-Service, Wikidata
dcausse created T241125: Import wikidata RDF dump to hadoop.
Dec 19 2019, 10:02 AM · Wikidata, Wikidata-Query-Service

Dec 18 2019

dcausse moved T236296: Create new message for overwritten queries when the original query has zero results from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Dec 18 2019, 12:58 PM · MW-1.35-notes (1.35.0-wmf.14; 2020-01-07), Discovery-Search (Current work)
dcausse created P9932 blazegraph server cmdline.
Dec 18 2019, 9:06 AM

Dec 17 2019

dcausse moved T240550: Add mapping for ORES topic field in ElasticSearch from In Progress to Needs review on the Discovery-Search (Current work) board.
Dec 17 2019, 6:35 PM · MW-1.35-notes (1.35.0-wmf.15; 2020-01-14), Discovery-Search (Current work)
dcausse awarded T240540: Investigate usage of the query service & queries that are run a Like token.
Dec 17 2019, 4:46 PM · Wikidata-Query-Service, User-Addshore, Wikidata
dcausse added a comment to T240540: Investigate usage of the query service & queries that are run.

Also T239852

Dec 17 2019, 1:38 PM · Wikidata-Query-Service, User-Addshore, Wikidata

Dec 16 2019

dcausse added a comment to T230495: Partition CirrusSearch mediawiki jobs by cluster.

@Mholloway yes it is expected, previously this topic was only used to replay failed updates to elasticsearch.
As Erik mentionned in a previous comment:

There will now be, approximately, 3x as many ElasticaWrite jobs as there were CirrusSearchLinksUpdate jobs. Ballpark estimate on latency is 300ms, basically dividing the current 700ms by three and rounding up a bit. We almost certainly need to increase concurrency here, using the current level of links update (300) is almost certainly safe, and we can adjust from there.

Dec 16 2019, 9:44 PM · MW-1.35-notes (1.35.0-wmf.8; 2019-11-26), Discovery-Search (Current work), Core Platform Team Workboards (Clinic Duty Team), Cloud-Services, Elasticsearch, Discovery
dcausse claimed T239750: org.wikidata.query.rdf.tool.Updater - Importer error: ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access.
Dec 16 2019, 6:31 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse moved T239750: org.wikidata.query.rdf.tool.Updater - Importer error: ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access from In Progress to Done on the Discovery-Search (Current work) board.
Dec 16 2019, 6:31 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse added a comment to T240518: Some jobs are not being processed / are processed slowly.

Looking at the graph "Rate of committed offset increment" from https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?orgId=1 it seems that only "low_traffic_jobs" are affected:


dropping from ~40 to 3.
With one topic (fetchGoogleCloudVisionAnnotations) constantly failing out of many that should run properly (all the ones consumed by low_traffic_jobs), if ChangeProp does not properly handle such scenario I suppose it could lead to such behavior.

Dec 16 2019, 3:18 PM · Wikimedia-Incident, SDC-Statements (Machine-vision-depicts), MachineVision, Product-Infrastructure-Team-Backlog, Structured-Data-Backlog, MW-1.35-notes (1.35.0-wmf.10; 2019-12-10), Core Platform Team Workboards (Clinic Duty Team), WMF-JobQueue, Operations
dcausse created P9872 Blazegraph multiSync response.
Dec 16 2019, 9:29 AM

Dec 12 2019

dcausse added a comment to T238002: WDQS Munger should be multi threaded.

Separation of

  • parsing
  • munging
  • writing
Dec 12 2019, 7:22 PM · Wikidata, Wikidata-Query-Service
dcausse triaged T239908: Extract more metrics from blazegraph sparql update response as Medium priority.
Dec 12 2019, 8:45 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse added a project to T239908: Extract more metrics from blazegraph sparql update response: Discovery-Search (Current work).
Dec 12 2019, 8:44 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service