Page MenuHomePhabricator

Eliminate Parsoid section.offsets table from Cassandra
Closed, ResolvedPublic

Description

Section offsets are a subpropery of the data-parsoid but we're storing it both in data-parsoid and separately, in sections.offsets table in Cassandra. The original idea was to optimize the reading of the sections so that the full data-parsoid structure shouldn't be read from storage and parsed to extract the offsets. However, I believe that was a very premature optimization. The sections endpoints are by far not the most popular, fetching and parsing the data-parsoid will probably take some negligible amount of time compared to everything else that should be done. On the other hand, removing it will simplify the code and given that we're short on IO in Cassandra, saving 1/5th of the writes for Parsoid content could be a decent win.

What do you think?

Event Timeline

I agree. On the one hand, section offsets are not nearly enough requested to warrant their own table, and on the other, RB workers are currently under-utilised so we can handle the extra CPU needed to extract them from data-parsoid.

This might be a bit too advanced to award the good first task tag, but at least it's very straightforward and makes good exposure to the convoluted storage semantics™ and the hell hole of parsoid.js module, so I will tag it.

Stashbot subscribed.

Mentioned in SAL (#wikimedia-operations) [2018-11-05T20:58:22Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@5b8ad3c]: Update deps, removed sections table, T207904 T206048 T207324

Mentioned in SAL (#wikimedia-operations) [2018-11-05T21:10:37Z] <ppchelko@deploy1001> Finished deploy [restbase/deploy@5b8ad3c]: Update deps, removed sections table, T207904 T206048 T207324 (duration: 12m 15s)

Mentioned in SAL (#wikimedia-operations) [2018-11-05T21:14:12Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@5b8ad3c]: Update deps, removed sections table, T207904 T206048 T207324 take 2

Mentioned in SAL (#wikimedia-operations) [2018-11-05T21:23:30Z] <ppchelko@deploy1001> Finished deploy [restbase/deploy@5b8ad3c]: Update deps, removed sections table, T207904 T206048 T207324 take 2 (duration: 09m 18s)

Pchelolo claimed this task.
Pchelolo edited projects, added Services (done); removed Services (later).
Pchelolo added a subscriber: Clarakosi.

Yay!!! Since the keyspaces have been actually deleted, we can close the task now. Congrats @Clarakosi