Page MenuHomePhabricator

Parsoid API responses contain unnecessary meta data bloating HTML response
Closed, ResolvedPublic

Description

For the purpose of Visual Editing, parsoid responds with meta data such as:

data-mw='{"parts":[{"template":{"target":{"wt":"Infobox dot-com company\n","href":"./Template:Infobox_dot-com_company"},

Given we want to use Parsoid to simplify the page HTML we serve to users, we will need to think about this and how it impacts our readers.

We'll need to measure the impact of having meta data vs not having meta data, and if we decide to remove it identify ways that VisualEditor can still obtain this information without pulling the entire content a second time round.

https://en.wikipedia.org/api/rest_v1/page/html/Facebook

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

Jdlrobson raised the priority of this task from to Normal.
Jdlrobson updated the task description. (Show Details)
brion added a subscriber: brion.Sep 30 2015, 7:32 PM

Might consider something like:

  • do an 'unzip' pass on the HTML that saves just the elements and their data-mw attributes on one hand (no other attribs or text content), and everything but the data-mw attributes on the other
  • fetch the data-mw-less pass for viewing
  • on edit request, fetch the data-mw-only version, iterate through, and apply data-mw attributes in the same place

If there's runtime modification of the DOM for showing/hiding, plugins etc that may make the 'rezipping' tricky. But it should work in theory. :D

See https://phabricator.wikimedia.org/T78676 .. This new ticket can be merged as a dupe of that one, I think. The only thing that has stopped us from going ahead with that is that all existing editing clients (VE, Flow, CX) would need to update their code to fetch data-mw separately.

I pressed submit hastily .. but I think once those clients are equipped to do this, that will unblock T78676.

Jdlrobson removed projects: Epic, Reading-Web-Planning.
Jdlrobson set Security to None.

Thanks @subbu. I'll keep this card since the motivation is different but resolving that bug looks like it should meet our requirements so added it as a blocking bug. Thanks!

Jdlrobson moved this task from Backlog to Tasks on the MobileFrontend board.

Joaquin is looking into this

Jhernandez closed this task as Resolved.Jun 22 2016, 4:55 PM

The parsing team is splitting data-mw from parsoid soon :)