Page MenuHomePhabricator

Client-side approach to true section editing: Load/display document-level data (e.g. internal lists, meta lists) for the whole document but only the content for a part of the document
Closed, ResolvedPublic

Description

Somehow(?) load/display document-level data (e.g. internal lists, meta lists) for the whole document but only the content for a part of the document.

Will probably need T49344: Internal nodes should eventually be in a separate document ("sub-documents") for the document splitting.

Event Timeline

Jdforrester-WMF raised the priority of this task from to Medium.
Jdforrester-WMF updated the task description. (Show Details)
Jdforrester-WMF changed Security from none to None.
Jdforrester-WMF added a subscriber: Catrope.

We've had a somewhat-related discussion in Parsoid-land about being able to fetch only subsets of the DOM tree from restbase. The idea there was that restbase would do the actual subtree extraction from the full document, using the fancy fast regexp-based stuff that @GWicke wrote. But you'd also want to extract and fetch data-mw and data-parsoid for the same subtree. We've talked about tree-structuring the data-mw and data-parsoid JSON blobs to make that possible (using a variant of @GWicke's HTML regexp technique), but I'm not sure we've actually implemented that yet.

We already support fetching specific HTML sections by ID in the REST API (see https://en.wikipedia.org/api/rest_v1/#!/Page_content/get_page_html_title), but until consistent <section> wrapping with a sensible granularity & perhaps a predictable section ID for the lead section are implemented in Parsoid (T114072), this is not as useful in practice as it could be.

Behind the scenes, the section-by-id retrieval is currently based on byte offset mappings provided by Parsoid and stored in Cassandra. Given the performance we see in web-html-stream (the library @cscott mentioned), we could however get rid of the explicit section offsets stored in cassandra, and do the matching / extraction dynamically instead.

Apart from the server side section retrieval API, clients will need to discover the page outline & section ids. The reading team has already been doing some section loading in apps & mobile web. This is not yet fully general, but is moving in that direction. Metadata about sections is exposed in the page summary end point. Another source of metadata is the lead section end point used by apps, which additionally includes the lead section content itself. Eventually, I hope that we will have standard <section>s in Parsoid output, and can refer to those in these summary end points for easy section retrieval.

Jdforrester-WMF renamed this task from Somehow(?) load document-level data (e.g. internal lists, meta lists) for the whole document but only the content for a part of the document to Load/display document-level data (e.g. internal lists, meta lists) for the whole document but only the content for a part of the document.May 20 2018, 7:39 AM
Jdforrester-WMF assigned this task to dchan.
Jdforrester-WMF updated the task description. (Show Details)
Jdforrester-WMF changed the point value for this task from 8 to 40.

Change 433753 had a related patch set uploaded (by Divec; owner: Divec):
[VisualEditor/VisualEditor@master] WIP POC: Support only surfacing part of the document

https://gerrit.wikimedia.org/r/433753

As well as the approach in https://gerrit.wikimedia.org/r/433753 , we should also consider the following alternative:

  • A class provides the LinearData interface on a live slice of the linear data (performing offset translation and bounds checking)
  • ve.dm.Document builds on top of that class

That approach would more robustly prevent access outside the slice. But one drawback would be apparent dangling references in the slice.

Change 440809 had a related patch set uploaded (by Divec; owner: Divec):
[mediawiki/extensions/VisualEditor@master] WIP Support only surfacing part of the document

https://gerrit.wikimedia.org/r/440809

marcella renamed this task from Load/display document-level data (e.g. internal lists, meta lists) for the whole document but only the content for a part of the document to Client-side approach to true section editing: Load/display document-level data (e.g. internal lists, meta lists) for the whole document but only the content for a part of the document.Nov 7 2018, 7:25 PM
marcella removed the point value for this task.

Change 433753 merged by jenkins-bot:
[VisualEditor/VisualEditor@master] Support only surfacing part of the document

https://gerrit.wikimedia.org/r/433753

Change 489319 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[mediawiki/extensions/VisualEditor@master] Update VE core submodule to master (b629aa8b1)

https://gerrit.wikimedia.org/r/489319

Change 489319 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Update VE core submodule to master (c4d559b29)

https://gerrit.wikimedia.org/r/489319

Change 440809 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Support only surfacing part of the document

https://gerrit.wikimedia.org/r/440809

Esanders subscribed.

Currently this is disabled behind a feature flag, so QA should just check that editing/section editing is unaffected by the code refactor. We will do a separate QA for the feature itself when we get around to enabling it by default.

I think the testing of this re-factor is pretty much done. I'm going to go ahead and mark it as verified. If we find more bugs as part of more integration testing, we will create new bugs. At least this one won't have to sit around under QA column.