Secondary updates create hundreds of unnecessary writes per second
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• GWicke
	Jul 10 2015, 4:01 PM

Description

While poking around in toppartitions output I noticed that we seem to rewrite an awful lot of renders per article. Some articles' renders are rewritten 180 times in 10 seconds. It seems that the index update code doesn't actually set an effective limit (possibly broken by the paging work?). It might also be nice to avoid rewriting renders that already have a TTL set.

In any case, this issue seems to be responsible for a good chunk of the write load and heap pressure we are seeing in production.

Sample from 1005:

nodetool -h restbase1004.eqiad.wmnet toppartitions local_group_wikipedia_T_parsoid_html data 20000
WRITES Sampler:
  Cardinality: ~2 (256 capacity)
  Top 10 partitions:
        Partition                                                                                       Count       +/-
        de.wikipedia.org:Benutzer\:Baumi                                                                  276         0
        fa.wikipedia.org:مدیاویکی\:Recentchangestext                                                      266         0
        fa.wikipedia.org:ویکیپدیا\:گزارش_دیتابیس/فهرست_مقالههای_پربیننده_از_۵۰۱_تا_۱۰۰۰/تاریخچه_۱       225         0
        fa.wikipedia.org:بحث_الگو\:سردر_تغییرات_اخیر                                                      181         0
        fa.wikipedia.org:الگو\:سردر_تغییرات_اخیر/سایر_ویکیها/افقی                                        176         0
        de.wikipedia.org:Benutzer\:Pixelfire                                                              176         0
        de.wikipedia.org:Benutzer\:Anachron                                                               163         0
        de.wikipedia.org:Benutzer\:Wahldresdner                                                           161         0
        de.wikipedia.org:Benutzer\:CeGe                                                                   151         0
        de.wikipedia.org:Benutzer\:Peter200                                                               151         0

READS Sampler:
  Cardinality: ~2 (256 capacity)
  Top 10 partitions:
        Partition                                                                                       Count       +/-
        de.wikipedia.org:Benutzer\:Baumi                                                                   14         0
        fa.wikipedia.org:مدیاویکی\:Recentchangestext                                                       13         0
        fa.wikipedia.org:بحث_الگو\:سردر_تغییرات_اخیر                                                       12         0
        de.wikipedia.org:Benutzer\:PeeCee                                                                  10         0
        fa.wikipedia.org:ویکیپدیا\:گزارش_دیتابیس/فهرست_مقالههای_پربیننده_از_۵۰۱_تا_۱۰۰۰/تاریخچه_۱        10         0
        de.wikipedia.org:Benutzer\:SchirmerPower/Statistik                                                  9         0
        de.wikipedia.org:Benutzer\:Anachron                                                                 9         0
        de.wikipedia.org:Benutzer\:Wahldresdner                                                             9         0
        de.wikipedia.org:Benutzer\:Peter200                                                                 8         0
        fa.wikipedia.org:الگو\:سردر_قهوهخانه/آزمایش                                                        8         0

Related Objects

Mentioned In: T118524: Alert on abnormal storage growth patterns
T94121: Understand and solve wide row issues for frequently edited and re-rendered pages
T102015: put new restbase servers in service
rOPUP10f2b21f1081: Bump up compactors back to 10

Event Timeline

• GWicke created this task.Jul 10 2015, 4:01 PM

• GWicke raised the priority of this task from to Needs Triage.

• GWicke updated the task description. (Show Details)

• GWicke subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 10 2015, 4:01 PM

• GWicke triaged this task as Unbreak Now! priority.Jul 10 2015, 4:01 PM

• GWicke added projects: RESTBase, RESTBase-Cassandra.

• GWicke set Security to None.

As a first step, we could add a somewhat hacky fix for the limit handling by manually appending + ' limit' + limit to the cql string *iff* limit is a number:

Hotfix from https://github.com/wikimedia/restbase-mod-table-cassandra/pull/120 is deployed.

The effect of deploying the hotfix has been fairly dramatic:

read throughput dropped from >200mb/s to less than 50mb/s
write throughput dropped from about 50mb/s to less than 10mb/s
iowait dropped from about 10% to 0-1%
timeout errors in logstash disappeared
sstables per read p99 is heading in the right direction (should be at most 5 for HTML)

• GWicke mentioned this in rOPUP10f2b21f1081: Bump up compactors back to 10.Jul 10 2015, 5:32 PM

• GWicke mentioned this in T102015: put new restbase servers in service.Jul 10 2015, 11:06 PM

Longer term impact:

With the re-gained headroom we turned up compaction throughput again, which you can see as a slight increase in IO bandwidth and niced CPU cycles. Storage size, sstables per read and latencies are shrinking, and there were no OOM or other issues today.

Lowering priority as the main issue is resolved. The code change we deployed this morning was only a hotfix however, and we'd like to clean this up more thoroughly. @Eevans is working on a solution that

a) cleans up limit handling without breaking paging, and
b) ideally avoids rewriting all but the replaced render by ignoring entries that already have a limited TTL set.

b) has the potential to further reduce write and compaction load from rerenders by about 50%.

Based on this experience, we should also make sure that we carefully test async update behavior against possible write amplification issues.

• GWicke assigned this task to Eevans.Jul 11 2015, 1:26 AM

Nice work!

In T105509#1446946, @GWicke wrote:

Lowering priority as the main issue is resolved. The code change we deployed this morning was only a hotfix however, and we'd like to clean this up more thoroughly. @Eevans is working on a solution that

a) cleans up limit handling without breaking paging, and
b) ideally avoids rewriting all but the replaced render by ignoring entries that already have a limited TTL set.

b) has the potential to further reduce write and compaction load from rerenders by about 50%.

Based on this experience, we should also make sure that we carefully test async update behavior against possible write amplification issues.

Completed in https://github.com/wikimedia/restbase-mod-table-cassandra/pull/122.

Eevans closed this task as Resolved.Jul 15 2015, 11:49 PM

Could we set up unit tests to make sure that we don't repeat unnecessary TTL rewrites? One option I could imagine would be to check for the TTL after staggering writes across several seconds, and then verifying that their TTLs differ.

In T105509#1458989, @GWicke wrote:

Could we set up unit tests to make sure that we don't repeat unnecessary TTL rewrites? One option I could imagine would be to check for the TTL after staggering writes across several seconds, and then verifying that their TTLs differ.

Done in https://github.com/wikimedia/restbase-mod-table-spec/pull/4

Eevans closed this task as Resolved.Jul 17 2015, 6:53 PM

• GWicke mentioned this in T94121: Understand and solve wide row issues for frequently edited and re-rendered pages.Jul 28 2015, 10:33 PM

Eevans mentioned this in T118524: Alert on abnormal storage growth patterns.Nov 12 2015, 11:22 PM

	F191220: pasted_file
	Jul 11 2015, 1:21 AM

	F191222: pasted_file
	Jul 11 2015, 1:21 AM

	F191092: pasted_file
	Jul 10 2015, 5:37 PM

	F191077: pasted_file
	Jul 10 2015, 5:05 PM

Secondary updates create hundreds of unnecessary writes per secondClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Secondary updates create hundreds of unnecessary writes per second
Closed, ResolvedPublic
Actions