Page MenuHomePhabricator

Consider stashing data-parsoid for VE
Closed, ResolvedPublic

Description

In order to avoid dirty diffs on visual editing, RESTBase needs to make sure that data-parsoid corresponding to the exact render of Parsoid HTML the edit is based on is still present in the storage when the edit is completed. Failing to do so results in broken edits or dirty diffs. In order to achieve that RESTBase employs a very complicated storage semantics when all of the parsoid content is pre-rendered, but the superseded renders of the content are guaranteed to be preserved for a grace_ttl period of time. This code is very complicated and creates a lot of IO overhead since special indexes have to be maintained for renders and revisions.

In order to simplify the system, I propose to change the approach and stash 'data-parsoid' and original HTML in a temporary stashing table with a TTL. This will create 2 additional writes on a read of the HTML, so this should not be done universally. Instead, VisualEditor should supply an additional query parameter indicating that the transform endpoints will be used later on the HTML. Additionally, the HTML provided to VE must not be cached in Varnish.

This task is created to discuss this alternative approach and collect data on how important Varnish caching is for VE use case (how much would this slow it down), how much of an IO win/lose are we looking at, how significant the slowdown from 2 additional writes will be and decide whether to pursue this path.

Event Timeline

Pchelolo created this task.Feb 12 2019, 7:35 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 12 2019, 7:35 PM
Dinoguy1000 updated the task description. (Show Details)Feb 12 2019, 7:52 PM
Eevans moved this task from Backlog to Watching on the User-Eevans board.Feb 13 2019, 3:25 PM
mobrovac added subscribers: BBlack, mobrovac.

A possible alternative is to differentiate VE's requests based on the UA header: VE sets both Api-User-Agent and User-Agent (sample RB log entry), so it is possible to recognise its requests by checking both headers and matching them against VisualEditor-Mediawiki in both Varnish and RB. This strikes me a slightly better approach than a query parameter since with the latter we can easily suffer (D)DoS attacks. Mind you, checking for headers is only sightly better, but better nevertheless.

As for users noticing the delay, I believe we are talking here about a slow-down of tens of milliseconds, so I don't think that will cause user-facing problems.

@BBlack what do you think of short-circuiting Varnish for VE requests?

Restricted Application added a project: Operations. · View Herald TranscriptFeb 15 2019, 8:10 PM

I don't think complicating Varnish logic worths here, I don't see how UA is better than a query parameter regarding DDOs. We can mitigate the DDOS possibility by setting Varnish max-age grace_ttl smaller then Cassandra stashing TTL for the VE requests and using 'must-revalidate'.

I don't think complicating Varnish logic worths here, I don't see how UA is better than a query parameter regarding DDOs.

It's only slightly better since it's easier to attach a query param than change the UA (even though it's easy to do that as well if you know what you are doing, ofc).

We can mitigate the DDOS possibility by setting Varnish max-age grace_ttl smaller then Cassandra stashing TTL for the VE requests and using 'must-revalidate'.

This would help only for cases where the same domain/title is being requested. If an attacker uses different URIs, it's still no bueno.

Correct me if I'm wrong, but I would think all VE traffic would already be uncacheable at the Varnish level anyways, since it happens in the context of a session (although in the future we might fix this with content composition work). As for the rest of this discussion, I don't think I understand the context enough to say anything about its sanity or whether it increases any attack surface in a way that matters.

Correct me if I'm wrong, but I would think all VE traffic would already be uncacheable at the Varnish level anyways, since it happens in the context of a session.

Ah correct, VE should be setting the cookie. We need to check whether it actually sends it to RESTBase when fetching the HTML. If it does (which it should anyways), we can then simply use the session ID as the stash key.

(although in the future we might fix this with content composition work)

As this is a larger endeavour that will touch a number of layers, we can leave this for later, but it's good to keep it in mind.

Mentioned in SAL (#wikimedia-operations) [2019-02-20T01:52:30Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@751dc5c]: Temporarily collect VE lrequest ogs for T215956

Mentioned in SAL (#wikimedia-operations) [2019-02-20T02:15:07Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@751dc5c]: Temporarily collect VE lrequest ogs for T215956 (duration: 22m 37s)

Mentioned in SAL (#wikimedia-operations) [2019-02-20T18:04:25Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@80f518c]: Remove VE request logging - T215956

Mentioned in SAL (#wikimedia-operations) [2019-02-20T18:24:44Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@80f518c]: Remove VE request logging - T215956 (duration: 20m 19s)

We have logged some VE requests on the RB side, and it turns out we cannot rely on the session ID to be present in the request. VE does send it, but only for logged-in users. As a consequence, some requests do get handled by Varnish. So, it seems like we will need to add a rule to Varnish to pass on these requests, and whether we do it based on the UA or a query param I think there is little difference in practice.

Since we cannot use the session ID, we'll need to find another way of keying the stashed page bundle.

ema moved this task from Triage to Caching on the Traffic board.Mar 6 2019, 9:52 AM
ema triaged this task as Normal priority.Mar 6 2019, 10:16 AM
ema added a subscriber: ema.

it seems like we will need to add a rule to Varnish to pass on these requests

Would it be an option to have RB emit no-cache response headers where appropriate?

it seems like we will need to add a rule to Varnish to pass on these requests

Would it be an option to have RB emit no-cache response headers where appropriate?

This is actually a pretty solid idea, given that RB will know (one way or the other) that it is a VE request.

Since in either case we don't need to have special Varnish rules, I think we need both the query parameter and UA matching in order to ease the transition. The plan would be:

  1. RB looks for both the query parameter and VE UA. If any of the two checks is positive, data-parsoid is stashed
  2. In a second phase, VE starts emitting the query parameter when fetching the HTML
  3. Finally, we drop the UA matching from RB and rely only on the query parameter

This way, we can start using the simplified storage independently of the MW deployment train.

ssastry moved this task from Backlog to Non-Parsoid Tasks on the Parsoid board.Apr 22 2019, 12:53 PM

Mentioned in SAL (#wikimedia-operations) [2019-04-24T20:21:02Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@8a6b6fc]: Parsoid storage simplification step 1: switch Parsoid stashing to simple key/value - T215956

Mentioned in SAL (#wikimedia-operations) [2019-04-24T20:41:41Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@8a6b6fc]: Parsoid storage simplification step 1: switch Parsoid stashing to simple key/value - T215956 (duration: 20m 39s)

Stashing now uses the simple, binary key-value bucket. Next step is to actually switch the main content bucket(s).

Mentioned in SAL (#wikimedia-operations) [2019-04-25T19:25:35Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@7187e0c]: Bump HTML content version in docs, remove Parsoid stash fall-back and start logging all sections requests - T221432 T215956 T216636

Mentioned in SAL (#wikimedia-operations) [2019-04-25T19:45:39Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@7187e0c]: Bump HTML content version in docs, remove Parsoid stash fall-back and start logging all sections requests - T221432 T215956 T216636 (duration: 20m 04s)

Mentioned in SAL (#wikimedia-operations) [2019-04-30T21:57:31Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@b3b140f]: Parsoid: Use the new stash tables for old revisions - T215956

Mentioned in SAL (#wikimedia-operations) [2019-04-30T22:21:28Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@b3b140f]: Parsoid: Use the new stash tables for old revisions - T215956 (duration: 23m 56s)

Mentioned in SAL (#wikimedia-operations) [2019-05-07T21:47:15Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@d91ee4c]: Do not cache html if stash was requested T215956

Mentioned in SAL (#wikimedia-operations) [2019-05-07T21:47:27Z] <ppchelko@deploy1001> deploy aborted: Do not cache html if stash was requested T215956 (duration: 00m 12s)

Mentioned in SAL (#wikimedia-operations) [2019-05-07T21:48:01Z] <ppchelko@deploy1001> Started deploy [restbase/deploy@8f5859f]: Do not cache html if stash was requested T215956

Mentioned in SAL (#wikimedia-operations) [2019-05-07T22:06:12Z] <ppchelko@deploy1001> Finished deploy [restbase/deploy@8f5859f]: Do not cache html if stash was requested T215956 (duration: 18m 12s)

Mentioned in SAL (#wikimedia-operations) [2019-05-21T10:43:56Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@cf00120]: Switch Parsoid to simple k/v bucket - T215956

Mentioned in SAL (#wikimedia-operations) [2019-05-21T11:09:46Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@cf00120]: Switch Parsoid to simple k/v bucket - T215956 (duration: 25m 50s)

Mentioned in SAL (#wikimedia-operations) [2019-05-21T11:46:41Z] <mobrovac> started enwiki dumps - T215956

Mentioned in SAL (#wikimedia-operations) [2019-05-21T11:58:49Z] <mobrovac> started frwiki dumps - T215956

Mentioned in SAL (#wikimedia-operations) [2019-05-21T11:59:41Z] <mobrovac> started dewiki dumps - T215956

Mentioned in SAL (#wikimedia-operations) [2019-05-21T15:30:31Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found - T215956

Mentioned in SAL (#wikimedia-operations) [2019-05-21T15:37:41Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found - T215956 (duration: 07m 10s)

Mentioned in SAL (#wikimedia-operations) [2019-05-22T09:16:50Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@b90fb8b]: Temporarily copy from old tables to new ones if the data is not found, the correct way this time - T215956

Mentioned in SAL (#wikimedia-operations) [2019-05-22T09:43:57Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@b90fb8b]: Temporarily copy from old tables to new ones if the data is not found, the correct way this time - T215956 (duration: 27m 07s)

Mentioned in SAL (#wikimedia-operations) [2019-05-22T09:52:18Z] <mobrovac> start the en, fr and de wiki dumps again to populate the new parsoid table - T215956

Mentioned in SAL (#wikimedia-operations) [2019-05-22T14:14:16Z] <mobrovac> start it, es wiki dumps (fr and de completed) to fill the new parsoid tables - T215956

Mentioned in SAL (#wikimedia-operations) [2019-05-23T04:24:49Z] <mobrovac> start nl, pt, pl wiki dumps to fill the new parsoid tables - T215956

Mentioned in SAL (#wikimedia-operations) [2019-05-23T06:14:48Z] <mobrovac> start ruwiki dumps to fill the new parsoid tables - T215956

Change 512181 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/restbase/deploy@master] Config: Introduce stash_ratelimit

https://gerrit.wikimedia.org/r/512181

Change 512181 merged by Mobrovac:
[mediawiki/services/restbase/deploy@master] Config: Introduce stash_ratelimit

https://gerrit.wikimedia.org/r/512181

Mentioned in SAL (#wikimedia-operations) [2019-05-24T05:58:40Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@b153f5d]: Remove Parsoid fallback and rate-limit stashing - T215956 T224055

Mentioned in SAL (#wikimedia-operations) [2019-05-24T06:20:11Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@b153f5d]: Remove Parsoid fallback and rate-limit stashing - T215956 T224055 (duration: 21m 30s)

Mentioned in SAL (#wikimedia-operations) [2019-05-30T07:21:10Z] <mobrovac@deploy1001> Started deploy [restbase/deploy@92591a7]: Switch to OpenAPI v3 and drop page/html/title/revision/tid - T218218 T215956

Mentioned in SAL (#wikimedia-operations) [2019-05-30T07:40:37Z] <mobrovac@deploy1001> Finished deploy [restbase/deploy@92591a7]: Switch to OpenAPI v3 and drop page/html/title/revision/tid - T218218 T215956 (duration: 19m 28s)

mobrovac closed this task as Resolved.

Everything here has now been completed \o/ Resolving.

Restricted Application added a project: User-Ryasmeen. · View Herald TranscriptThu, May 30, 8:32 AM