Page MenuHomePhabricator

Partition CirrusSearch mediawiki jobs by cluster
Open, HighPublic


The recent deployment of cloudelastic, the third elasticsearch cluster we write to, has made our mediawiki job response times for writes incredibly erratic. Additionally cloudelastic isn't nearly as powerful ( ~1/10 the size) and can't always keep up with the full update rate. To support this use case we want to partition these jobs such that each cluster can be written to independently.

The overall goal is to allow cloudelastic to fall behind and catch back up at it's own pace, independent from the primary clusters. Any slowdowns with cloudelastic needs to have little, if any, impact on writes to the primary clusters.

Event Timeline

Restricted Application edited projects, added Discovery-Search; removed Discovery-Search (Current work). · View Herald TranscriptAug 14 2019, 4:08 PM
Pchelolo triaged this task as High priority.Aug 14 2019, 5:15 PM
bd808 renamed this task from Partition CirrusSerch mediawiki jobs by cluster to Partition CirrusSearch mediawiki jobs by cluster.Aug 16 2019, 9:08 AM
bd808 updated the task description. (Show Details)

@Pchelolo is there an action for Core Platform on this?

@kchapman yes. After the search team makes the jobs ready to be partitioned.

EBernhardson added a comment.EditedTue, Aug 20, 3:47 PM

We need to rework our updater a little bit to share some expensive work before the partitioned jobs, but pull the ContentHandler data per-partition. Shouldn't be that much work, but needs to be done on our end so the cirrusSearchElasticaWrite job can be partitioned