Spent a good part of the day on this in IRC, copying some relevant info here:
- dcausse started a completion suggester build this morning, it ran all day (14hours and still going)
- Choosing a random day, 20170312, all completion suggester indices built in 8 hours
- enwiki built 5.35M docs from 03:44:26 to 05:22:16
- about 100 minutes, so 54,720 docs/minute
- ebernhardson took a 5 minute tcpdump of all traffic to/from terbium to codfw cluster hosts
- Indexing bulks is reporting an average of ~10ms
- scrolls reporting an average of ~100ms, peaks ~200ms
- 3088 /_bulk requests over 5 minutes, average content size of 41kB. This is for all 4 wikis currently indexing
- Hard to filter packets to only one wiki, but assuming 1/4 thats 772 bulks per wiki, 77,200k docs per 5 minutes, 15.4k docs per minute per wiki
- enwiki is 5 mil docs, 15.4k docs/minute would take 5.5 hours or more.
- 15.4k docs/minute, vs 54k docs/minute previously. Where did the time go?
- 772 bulks in 5 minutes per wiki is 2.57 round trips per second
Possible solutions:
- batch sizes are very small. request and response cycle happens multiple times per second. Current size is only 100 docs per scroll + bulk.
- Figure out what happened to make this so much slower...