Page MenuHomePhabricator

dcausse (David Causse)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Jun 9 2015, 9:03 AM (281 w, 3 d)
Availability
Available
IRC Nick
dcausse
LDAP User
DCausse
MediaWiki User
DCausse (WMF) [ Global Accounts ]

Recent Activity

Today

dcausse removed a project from T266850: CategoryChangesAsRdfTest::testCategorization: Failed asserting that two strings are equal.: Wikidata.
Fri, Oct 30, 4:40 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Wikidata-Query-Service, MediaWiki-General, ci-test-error (WMF-deployed Build Failure)
dcausse edited projects for T266850: CategoryChangesAsRdfTest::testCategorization: Failed asserting that two strings are equal., added: Wikidata-Query-Service; removed Discovery.
Fri, Oct 30, 4:39 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Wikidata-Query-Service, MediaWiki-General, ci-test-error (WMF-deployed Build Failure)

Yesterday

dcausse renamed T266750: The streaming updater consumer should stop accumulating patches if it cannot handle them from The streaming updater consumer should stop accumulating patches if it cannot handle to The streaming updater consumer should stop accumulating patches if it cannot handle them.
Thu, Oct 29, 3:14 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse moved T266750: The streaming updater consumer should stop accumulating patches if it cannot handle them from Incoming to Needs review on the Discovery-Search (Current work) board.
Thu, Oct 29, 2:48 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse claimed T266750: The streaming updater consumer should stop accumulating patches if it cannot handle them.
Thu, Oct 29, 2:47 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse created T266762: The sanitizer is a lot slower than when running in codfw (oct 27 2020 codfw -> eqiad switchover).
Thu, Oct 29, 9:33 AM · Discovery-Search, CirrusSearch
dcausse created T266751: The streaming updater should identify all shared statements properly.
Thu, Oct 29, 9:13 AM · Patch-For-Review, Wikidata, Wikidata-Query-Service
dcausse created T266750: The streaming updater consumer should stop accumulating patches if it cannot handle them.
Thu, Oct 29, 9:06 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Wed, Oct 28

dcausse added a comment to T262942: PoC on anomaly detection with Flink.

Yes it definitely can support such queries e.g (extract all api requests from mediawiki.apiaction grouped by their action param and database where the avg backend time is > 100ms over a 1 minute window).

Wed, Oct 28, 2:47 PM · Discovery-Search (Current work), Analytics-Radar, Wikidata, Wikidata-Query-Service

Tue, Oct 27

dcausse placed T190132: Introduce keyword interfaces dedicated per usage up for grabs.
Tue, Oct 27, 4:44 PM · Discovery-Search, MW-1.32-notes (WMF-deploy-2018-05-08 (1.32.0-wmf.3)), Discovery, CirrusSearch

Mon, Oct 26

dcausse moved T262845: Investigate SearchSatisfaction mismatched test buckets from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Mon, Oct 26, 6:12 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Discovery-Search (Current work)
dcausse moved T265455: SearchSatisfaction instrumentation should cleanup the search URL from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Mon, Oct 26, 6:12 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Discovery-Search (Current work), CirrusSearch
dcausse claimed T255657: Strange result in Wikidata query (full URLs given instead of identifiers).
Mon, Oct 26, 6:09 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse moved T265452: Add a configurable restart strategy to the streaming updater from In Progress to To Be Deployed on the Discovery-Search (Current work) board.
Mon, Oct 26, 6:08 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse moved T263952: mwapi calls rarely return results from In Progress to To Be Deployed on the Discovery-Search (Current work) board.
Mon, Oct 26, 6:07 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse added a comment to T265967: Proposal: drop avro dependency from mediawiki.

+1

Mon, Oct 26, 4:44 PM · Analytics-Radar, Discovery-Search, MediaWiki-General
dcausse closed T266070: wdqs updater failing on parse error as Declined.

There's nothing to fix in the updater related to this ticket, the reason was a bad response from one mw machine.

Mon, Oct 26, 4:40 PM · Wikidata, Wikidata-Query-Service, Discovery-Search (Current work)
dcausse closed T189877: Cirrussearch Error "An error has occurred while searching" due to comma used as decimal separator as Resolved.

@Aschroet thanks for the reply, closing as it seems you found a workaround.
Please feel free to re-open if you think there's still a fix to be made to Cirrus.

Mon, Oct 26, 4:35 PM · Discovery-Search, Regression, CirrusSearch
dcausse moved T265056: Cirrus Search dumps failed for some wikis from Bugs to elastic / cirrus on the Discovery-Search board.

It would be great to make this process more robust to connection issues but for this I think we should move away from the scroll API to fetch documents.

Mon, Oct 26, 4:22 PM · Discovery-Search, CirrusSearch, Dumps-Generation
dcausse created T266470: Expose wdqs1009 to wdqs users and gather feedback.
Mon, Oct 26, 1:21 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Fri, Oct 23

dcausse added a subtask for T244590: [Epic] Rework the WDQS updater as an event driven application: T266321: Determine flink metrics configuration and backend when running from k8s.
Fri, Oct 23, 10:17 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service, Epic
dcausse added a parent task for T266321: Determine flink metrics configuration and backend when running from k8s: T244590: [Epic] Rework the WDQS updater as an event driven application.
Fri, Oct 23, 10:17 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse created T266321: Determine flink metrics configuration and backend when running from k8s.
Fri, Oct 23, 10:16 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse added a subtask for T244590: [Epic] Rework the WDQS updater as an event driven application: T266318: Clarify dependencies on codehale dropwizards.
Fri, Oct 23, 10:01 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service, Epic
dcausse added a parent task for T266318: Clarify dependencies on codehale dropwizards: T244590: [Epic] Rework the WDQS updater as an event driven application.
Fri, Oct 23, 10:00 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse updated the task description for T266318: Clarify dependencies on codehale dropwizards.
Fri, Oct 23, 10:00 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse created T266318: Clarify dependencies on codehale dropwizards.
Fri, Oct 23, 9:11 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Thu, Oct 22

dcausse moved T255657: Strange result in Wikidata query (full URLs given instead of identifiers) from Waiting to In Progress on the Discovery-Search (Current work) board.

All the revisions I manually checked were created on this same day 2020-06-12 before mw1384 was depooled, I'm trying to extract a full list from one server but I'm having hard times making blazegraph not fail:

Thu, Oct 22, 4:17 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse added a comment to T255657: Strange result in Wikidata query (full URLs given instead of identifiers).

The revision reported in T266211 was created on 2020-06-12T06:36:58Z which also coincides with the date of problems identified in T264042.
Looking at logs we seemed to have had troubles with a MW machine at this times: T255282 which relates to the opcache issue and the RDF code in wikibase.

Thu, Oct 22, 1:29 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Wed, Oct 21

dcausse moved T263952: mwapi calls rarely return results from Waiting to In Progress on the Discovery-Search (Current work) board.

resuming investigation, additional logs seem to suggest that the jetty http client (or the way we use it) is to blame.

Wed, Oct 21, 3:57 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse claimed T265452: Add a configurable restart strategy to the streaming updater.
Wed, Oct 21, 10:42 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse added a project to T266070: wdqs updater failing on parse error: Wikidata-Query-Service.
Wed, Oct 21, 9:08 AM · Wikidata, Wikidata-Query-Service, Discovery-Search (Current work)
dcausse added a comment to T245183: PHP7 corruption reports in 2020 (Call on wrong object, etc.).

must be an instance of Vikibase\Search\ [...]

Is this some sort of copying glitch?

Wed, Oct 21, 9:02 AM · Wikimedia-production-error, serviceops, Operations
dcausse added a comment to T245183: PHP7 corruption reports in 2020 (Call on wrong object, etc.).
  • timestamp: 2020-10-20T18:10:00 to 2020-10-20T21:15:00
  • host: mw2252
  • message:

[2b171d8b-48ec-480d-b7a4-187dd3af259c] /w/api.php?titles=Image%3ANorrlands_nation_Nya_entr%C3%A9n2.jpg&iiprop=url&iiurlwidth=120&iiurlheight=120&prop=imageinfo&format=json&action=query TypeError from line 395 of /srv/mediawiki/php-1.36.0-wmf.13/extensions/WikibaseCirrusSearch/src/Hooks.php: Return value of Wikibase\Search\Elastic\Hooks::getWBCSConfig() must be an instance of Vikibase\Search\Elastic\WikibaseSearchConfig, instance of Vikibase\Search\Elastic\WikibaseSearchConfig returned

Wed, Oct 21, 8:53 AM · Wikimedia-production-error, serviceops, Operations
dcausse added a comment to T266070: wdqs updater failing on parse error.

< at line 1 looks suspiciously as an HTML blob being returned while calling the recent change API, could it be that the host hit by this request was not fully functional at this time?

Wed, Oct 21, 8:53 AM · Wikidata, Wikidata-Query-Service, Discovery-Search (Current work)

Tue, Oct 20

dcausse moved T266027: Test perfield_builder on spaceless languages from Incoming to Needs review on the Discovery-Search (Current work) board.
Tue, Oct 20, 3:29 PM · Chinese-Sites, Patch-For-Review, Discovery-Search (Current work), CirrusSearch
dcausse moved T265455: SearchSatisfaction instrumentation should cleanup the search URL from In Progress to Needs review on the Discovery-Search (Current work) board.
Tue, Oct 20, 3:27 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Discovery-Search (Current work), CirrusSearch
dcausse moved T262845: Investigate SearchSatisfaction mismatched test buckets from In Progress to Needs review on the Discovery-Search (Current work) board.

The 246979 non-matching events are likely due to T265374
For the 7204 I could only find these two explanations:

  • User clicks a search link that has a cirrusUserTesting=bucket attached to it
  • User reopen its browser with several tabs opened one of which has link with a cirrusUserTesting=bucket param attached to it
Tue, Oct 20, 3:26 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Discovery-Search (Current work)
dcausse updated the task description for T266027: Test perfield_builder on spaceless languages.
Tue, Oct 20, 2:52 PM · Chinese-Sites, Patch-For-Review, Discovery-Search (Current work), CirrusSearch
dcausse created T266027: Test perfield_builder on spaceless languages.
Tue, Oct 20, 2:50 PM · Chinese-Sites, Patch-For-Review, Discovery-Search (Current work), CirrusSearch

Mon, Oct 19

dcausse committed rWDAN3c590e29b660: Fix columns mismatch for discovery.wikibase_item (authored by dcausse).
Fix columns mismatch for discovery.wikibase_item
Mon, Oct 19, 4:30 PM
dcausse placed T265896: Add wbgetentities and wbgetclaims access to MWAPI config up for grabs.
Mon, Oct 19, 3:45 PM · Wikidata-Query-Service, Wikidata
dcausse claimed T265896: Add wbgetentities and wbgetclaims access to MWAPI config.
Mon, Oct 19, 3:36 PM · Wikidata-Query-Service, Wikidata

Wed, Oct 14

dcausse claimed T265455: SearchSatisfaction instrumentation should cleanup the search URL.
Wed, Oct 14, 9:06 AM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Discovery-Search (Current work), CirrusSearch
dcausse moved T265455: SearchSatisfaction instrumentation should cleanup the search URL from needs triage to Current work on the Discovery-Search board.
Wed, Oct 14, 9:05 AM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Discovery-Search (Current work), CirrusSearch
dcausse updated the task description for T265455: SearchSatisfaction instrumentation should cleanup the search URL.
Wed, Oct 14, 9:01 AM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Discovery-Search (Current work), CirrusSearch
dcausse added a project to T265455: SearchSatisfaction instrumentation should cleanup the search URL: CirrusSearch.
Wed, Oct 14, 8:57 AM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Discovery-Search (Current work), CirrusSearch
dcausse added a subtask for T262845: Investigate SearchSatisfaction mismatched test buckets: T265455: SearchSatisfaction instrumentation should cleanup the search URL.
Wed, Oct 14, 8:56 AM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Discovery-Search (Current work)
dcausse added a parent task for T265455: SearchSatisfaction instrumentation should cleanup the search URL: T262845: Investigate SearchSatisfaction mismatched test buckets.
Wed, Oct 14, 8:56 AM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Discovery-Search (Current work), CirrusSearch
dcausse created T265455: SearchSatisfaction instrumentation should cleanup the search URL.
Wed, Oct 14, 8:56 AM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Discovery-Search (Current work), CirrusSearch
dcausse moved T265374: AdvancedSearch should end the current request when redirecting the namespaced search URL from Incoming to Needs Reporting on the Discovery-Search (Current work) board.
Wed, Oct 14, 8:44 AM · MW-1.36-notes (1.36.0-wmf.14; 2020-10-20), Advanced-Search, Discovery-Search (Current work)
dcausse added a subtask for T244590: [Epic] Rework the WDQS updater as an event driven application: T265452: Add a configurable restart strategy to the streaming updater.
Wed, Oct 14, 8:16 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service, Epic
dcausse added a parent task for T265452: Add a configurable restart strategy to the streaming updater: T244590: [Epic] Rework the WDQS updater as an event driven application.
Wed, Oct 14, 8:16 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse created T265452: Add a configurable restart strategy to the streaming updater.
Wed, Oct 14, 8:13 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Tue, Oct 13

dcausse added a comment to T265113: Memory issue on elastic1063 caused elasticsearch to be killed.

happened again today:

Tue, Oct 13, 7:45 PM · ops-eqiad, Discovery-Search, Operations
dcausse updated the task description for T265374: AdvancedSearch should end the current request when redirecting the namespaced search URL.
Tue, Oct 13, 4:29 PM · MW-1.36-notes (1.36.0-wmf.14; 2020-10-20), Advanced-Search, Discovery-Search (Current work)
dcausse created T265374: AdvancedSearch should end the current request when redirecting the namespaced search URL.
Tue, Oct 13, 4:21 PM · MW-1.36-notes (1.36.0-wmf.14; 2020-10-20), Advanced-Search, Discovery-Search (Current work)

Mon, Oct 12

dcausse added a comment to T265056: Cirrus Search dumps failed for some wikis.

capturing some logs before they vanish:

Mon, Oct 12, 4:34 PM · Discovery-Search, CirrusSearch, Dumps-Generation
dcausse reopened T189877: Cirrussearch Error "An error has occurred while searching" due to comma used as decimal separator as "Open".

@Aschroet could you append &cirrusDumpQuery to the search URL you obtain when the error occurs and paste its output on the ticket, thanks!

Mon, Oct 12, 7:15 AM · Discovery-Search, Regression, CirrusSearch

Fri, Oct 9

dcausse created T265113: Memory issue on elastic1063 caused elasticsearch to be killed.
Fri, Oct 9, 7:24 AM · ops-eqiad, Discovery-Search, Operations

Thu, Oct 8

dcausse added a comment to T262845: Investigate SearchSatisfaction mismatched test buckets.

For 691588 backend events matching a test bucket:

  • 437764 match a SearchSatisfaction searchResultPage event
  • 7204 are inconsistent with their corresponding SearchSatisfaction searchResultPage event (joining on the search token)
  • 246979 have no matching SearchSatisfaction searchResultPage event, only 10 are matching go, rest is unclear
Thu, Oct 8, 5:04 PM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Discovery-Search (Current work)

Tue, Oct 6

dcausse claimed T262845: Investigate SearchSatisfaction mismatched test buckets.
Tue, Oct 6, 9:34 AM · MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Discovery-Search (Current work)
dcausse added a comment to T258054: [M] Tweak impact of quality assessments on results score.

I don't think there exists a formal process to change these values on wiki. My experience around these values have been:

  • disable them on enwiki through wgCirrusSearchIgnoreOnWikiBoostTemplates because they were incompatible with the switch to BM25, at the time only the original author of CirrusSearch had set them there so as a CirrusSearch maintainer I took the liberty to disable them
  • on wikitech these values are actively maintained by wiki admins
Tue, Oct 6, 7:21 AM · MW-1.36-notes (1.36.0-wmf.13; 2020-10-12), Patch-For-Review, SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Structured-Data-Backlog (Current Work)

Mon, Oct 5

dcausse added a comment to T264618: Update the suggester's content based on existing statements on the item.

What's inside in the elasticsearch index could allow some level of filtering/re-ranking based on some context provided.
Currently statements that resolve to time values are not indexed and would be required here.
On the other hand selecting (or ranking higher) entities with proper P31 could be done if the list of P31 items can be inferred easily from the property itself (using P1629 perhaps?).

Mon, Oct 5, 4:59 PM · MediaWiki-extensions-PropertySuggester, Wikidata
dcausse added a comment to T258054: [M] Tweak impact of quality assessments on results score.

If the problem to solve is related to pages being tagged with more than one of these templates I'd suggest the simple approach you suggested (dismax) but setting score_mode = max in includes/Search/Rescore/BoostTemplatesFunctionScoreBuilder.php. Template boosting is rarely used and I'm sure most of the time they have been adjusted with only one matching template in mind and I'm sure this change would benefit the rare other wikis using this feature.
If the problem is more regaining control over template boosting because the way they're applied is not compatible with the ranking formula being implemented I'd suggest setting a dedicated rescore profile, this will give more flexibility to tune these settings. Issue being that wgCirrusSearchBoostTemplates and wgCirrusSearchIgnoreOnWikiBoostTemplates are global to all query builders.

Mon, Oct 5, 2:13 PM · MW-1.36-notes (1.36.0-wmf.13; 2020-10-12), Patch-For-Review, SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Structured-Data-Backlog (Current Work)

Fri, Oct 2

dcausse created P12895 stat1007 scap deploy.
Fri, Oct 2, 8:46 AM
dcausse committed rWDAN5713fb09d0d7: Fix lexeme dumps expected date (authored by dcausse).
Fix lexeme dumps expected date
Fri, Oct 2, 8:23 AM

Thu, Oct 1

dcausse moved T263952: mwapi calls rarely return results from In Progress to Waiting on the Discovery-Search (Current work) board.

The root cause of the problem is yet unclear.
Added some more debug logs to continue investigating.
What I know so far is that only codfw was affected and restarting blazegraph on an affected node fixed the issue. A state is probably leaked but it's unclear where yet, could be in blazegraph itself or in the jetty http client (the additional logging should hopefully help to discard one option or another).

Thu, Oct 1, 6:31 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Wed, Sep 30

abian awarded T263952: mwapi calls rarely return results a Burninate token.
Wed, Sep 30, 8:05 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Sep 30 2020

dcausse created T264164: Cleanup broken dumps in /wikidatawiki/entities/20200921/.
Sep 30 2020, 9:39 AM · Discovery-Search (Current work), Wikidata, Dumps-Generation
dcausse added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

For the record here are some graphs taken over the same period (jun-2020 to sept-2020):

Sep 30 2020, 8:11 AM · Patch-For-Review, Discovery-Search (Current work)

Sep 29 2020

dcausse claimed T263952: mwapi calls rarely return results.
Sep 29 2020, 4:20 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse added a comment to T262942: PoC on anomaly detection with Flink.

Looking at existing solutions based on flink in this area I don't think this is a good fit for the table API and/or SQL unless the usecase is relatively simple (does not require fine control on the state nor specific timers).
Most solutions I've seen describe a similar architecture:

  • event ingestion (exactly what eventgate does)
  • flink pipeline:
    • read from existing event sources and possibly join multiple ones
    • key (partitioning)
    • feature extraction (time operation/aggregation/...)
    • anomaly detection (applying rules/models)
  • front-end (alerts/UI)
Sep 29 2020, 3:57 PM · Discovery-Search (Current work), Analytics-Radar, Wikidata, Wikidata-Query-Service
dcausse added a project to T252731: Wikidata nodeID values sometimes start with numbers, causing parsing issues.: Wikidata-Query-Service.

no objections to prefixing a letter or a couple chars here, the query service munging process can easily be adapted to remove such prefixes when skolemizing the blank nodes.

Sep 29 2020, 1:11 PM · Discovery-Search (Current work), Wikidata-Campsite, Wikidata-Query-Service, Wikidata
dcausse added a comment to T264053: Unsustainable increases in Elasticsearch cluster disk IO.

Something seems to have happened around jul 14th, it's particularly visible on https://grafana.wikimedia.org/d/000000607/cluster-overview?orgId=1&var-site=eqiad&var-cluster=elasticsearch&var-instance=All&var-datasource=thanos&from=now-90d&to=now (esp. the temperature&network graphs).
The search thread pool sizes started to rise more regularly after this date as well.

Sep 29 2020, 9:04 AM · Patch-For-Review, Discovery-Search (Current work)

Sep 28 2020

dcausse moved T261841: Tag WDQS query log with the source of the query (UI vs direct access) from Needs review to To Be Deployed on the Discovery-Search (Current work) board.
Sep 28 2020, 5:29 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse moved T262896: <http://wikiba.se/ontology#wikiGroup> triples should be marked as "linked/unlinked shared statements" from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Sep 28 2020, 5:29 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse moved T259115: Import wikidata ttl dumps in a hive table from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.
Sep 28 2020, 5:28 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
dcausse edited projects for T263970: ElasticSearch unassigned shard check apifeatureusage-2020.06.30@codfw and enwiki_general_1587198756@eqiad, added: Discovery-Search (Current work); removed Discovery-Search.
Sep 28 2020, 8:56 AM · Discovery-Search (Current work)
dcausse renamed T263970: ElasticSearch unassigned shard check apifeatureusage-2020.06.30@codfw and enwiki_general_1587198756@eqiad from ElasticSearch unassigned shard check apifeatureusage-2020.06.30@codfw and enwiki_general_1587198756@codfw to ElasticSearch unassigned shard check apifeatureusage-2020.06.30@codfw and enwiki_general_1587198756@eqiad.
Sep 28 2020, 7:24 AM · Discovery-Search (Current work)
dcausse created T263970: ElasticSearch unassigned shard check apifeatureusage-2020.06.30@codfw and enwiki_general_1587198756@eqiad.
Sep 28 2020, 7:14 AM · Discovery-Search (Current work)
dcausse created P12808 apifeatureusage allocation problem elasticsearch chi@codfw.
Sep 28 2020, 7:13 AM
dcausse created P12807 enwiki_general_1587198756 allocation failure elasticsearch chi@eqiad.
Sep 28 2020, 7:01 AM

Sep 25 2020

dcausse committed rWDAN94c8e6a62902: Set explicit start date (authored by dcausse).
Set explicit start date
Sep 25 2020, 5:17 PM

Sep 23 2020

dcausse claimed T262942: PoC on anomaly detection with Flink.
Sep 23 2020, 1:48 PM · Discovery-Search (Current work), Analytics-Radar, Wikidata, Wikidata-Query-Service
dcausse moved T263596: cirrus SuggestScoringTest randomized testing found failure case from Incoming to Needs review on the Discovery-Search (Current work) board.
Sep 23 2020, 1:12 PM · MW-1.36-notes (1.36.0-wmf.11; 2020-09-29), Discovery-Search (Current work), CirrusSearch
dcausse claimed T263596: cirrus SuggestScoringTest randomized testing found failure case.
Sep 23 2020, 1:12 PM · MW-1.36-notes (1.36.0-wmf.11; 2020-09-29), Discovery-Search (Current work), CirrusSearch
dcausse closed T263110: Investigate the cause of: ChecksumError: offset=517789868032,nbytes=16,expected=-58390144,actual=535102966 while importing wikidata dumps as Declined.

I did not find anything obvious but looking at the various classes involved in managing the writes I see excessive locking protection and object reuse esp:

  • WriteCacheService which keeps and reuses WriteCache instances.
  • WriteCache which (protects?) wrap access to a ByteBuffer
  • DirectBufferPool which according to comments seems to have issues managing its references: When DEBUG is true we do not permit a buffer which was not correctly release to be reused which in other words means When DEBUG is false we do permit a buffer which was not correctly release to be reused
Sep 23 2020, 9:07 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Sep 22 2020

dcausse claimed T263110: Investigate the cause of: ChecksumError: offset=517789868032,nbytes=16,expected=-58390144,actual=535102966 while importing wikidata dumps.
Sep 22 2020, 7:36 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Sep 21 2020

dcausse moved T262896: <http://wikiba.se/ontology#wikiGroup> triples should be marked as "linked/unlinked shared statements" from In Progress to Needs review on the Discovery-Search (Current work) board.
Sep 21 2020, 1:02 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse moved T262896: <http://wikiba.se/ontology#wikiGroup> triples should be marked as "linked/unlinked shared statements" from Incoming to In Progress on the Discovery-Search (Current work) board.
Sep 21 2020, 11:58 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse added a project to T262896: <http://wikiba.se/ontology#wikiGroup> triples should be marked as "linked/unlinked shared statements": Discovery-Search (Current work).
Sep 21 2020, 11:58 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse claimed T262896: <http://wikiba.se/ontology#wikiGroup> triples should be marked as "linked/unlinked shared statements".
Sep 21 2020, 11:58 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse moved T261125: Allow domain wikibooks.org from wdqs mwapi service from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Sep 21 2020, 9:46 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
dcausse assigned T258641: Elastica\Exception\ResponseException from line 56 of includes/Searcher.php: to EBernhardson.
Sep 21 2020, 7:51 AM · MW-1.36-notes (1.36.0-wmf.10; 2020-09-22), Discovery-Search (Current work), Release-Engineering-Team (Logspam), GeoData, Wikimedia-production-error

Sep 18 2020

dcausse updated subscribers of T258055: [L] [SPIKE] Investigate traversing entities tree to include more entities with more detail.

I like the idea of using the wikidata graph (via SPARQL) to explore possibilities of pulling interesting data to feed a query expansion engine.
Using WDQS for serving real time search traffic on the other hand is not an option I think (for perf reasons) but I believe it could make sense to create a dedicated dataset using the findings you've made here. This dataset could be used for two purposes:

  • the initial concept lookup (replacing the need to use wikidata fulltext search)
  • the expansion of the concepts following certain paths of the graph like you experimented
Sep 18 2020, 12:47 PM · MW-1.36-notes (1.36.0-wmf.14; 2020-10-20), SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Structured-Data-Backlog (Current Work)

Sep 17 2020

dcausse added a comment to T258641: Elastica\Exception\ResponseException from line 56 of includes/Searcher.php:.

Change 627394 merged by jenkins-bot:
[mediawiki/extensions/GeoData@master] search: Pass through status return values from cirrus to api

https://gerrit.wikimedia.org/r/627394

I am not sure, but this patch fixed something different from what's reported in the stack trace of task. And so the exception only increased as reported in T263128

Sep 17 2020, 3:25 PM · MW-1.36-notes (1.36.0-wmf.10; 2020-09-22), Discovery-Search (Current work), Release-Engineering-Team (Logspam), GeoData, Wikimedia-production-error
dcausse merged task T263128: ResponseException from line 56 of extensions/GeoData/includes/Searcher.php: into T258641: Elastica\Exception\ResponseException from line 56 of includes/Searcher.php:.
Sep 17 2020, 3:21 PM · Patch-For-Review, Discovery-Search, Search-Platform-Programs, Elasticsearch, Wikimedia-production-error
dcausse merged T263128: ResponseException from line 56 of extensions/GeoData/includes/Searcher.php: into T258641: Elastica\Exception\ResponseException from line 56 of includes/Searcher.php:.
Sep 17 2020, 3:21 PM · MW-1.36-notes (1.36.0-wmf.10; 2020-09-22), Discovery-Search (Current work), Release-Engineering-Team (Logspam), GeoData, Wikimedia-production-error
dcausse edited projects for T263132: Could not enqueue jobs from stream mediawiki.job.cirrusSearchIncomingLinkCount, added: Event-Platform, Operations; removed Search-Platform-Programs, CirrusSearch, Discovery-Search.

https://grafana.wikimedia.org/d/ePFPOkqiz/eventgate?orgId=1&refresh=1m&from=now-3h&to=now shows a restart/deploy during this spike so I guess it's not related to the train.

Sep 17 2020, 2:15 PM · Analytics-Kanban, Operations, Event-Platform, Analytics, Wikimedia-production-error