Page MenuHomePhabricator

Parsing team: Q3 2015-16 goals planning dependency tracker task
Closed, ResolvedPublic

Description

Reading, Visual Editor, Flow, Language, Services and possibly Community Tech undertake work that requires support from the Parsing Team. To help with our team's Q3 planning process, it would be helpful to know about projects that you would like support from us.

This is a tracker task to identify Q3 dependencies on the Parsing team. Feel tree to either add this as a blocker on your team's tasks that you depend on parsing team work in Q3 2015-16 (Jan - Mar 2016) or add a comment here. We will use this information to figure out what we can reasonably prioritize in Q3. Those that we cannot get done in Q3 will have the blocker removed.

Event Timeline

ssastry raised the priority of this task from to High.
ssastry updated the task description. (Show Details)
ssastry subscribed.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Arrbee added subscribers: Amire80, santhosh.
Arrbee subscribed.

Some things that I'd love to see prioritized on your end are:

  • Improve how Parsoid marks up semantic content elements like navboxes, infoboxes, table of content (T105845), sections (T114072) to make it easier and more efficient to work with / post-process the content.
  • Provide component metadata (T105845), to aid in composition of content & solving generic issues like dependency tracking, cache invalidation, redirects.
  • Multimedia support, even if it's "just" <video> tags.
  • Not a hard blocker, but I think we should seriously tackle this to avoid driving projects like CX into implementing their own: T116350: Design and implement an algorithm to provide stable element ids
  • Multimedia support, even if it's "just" <video> tags.

This is one of our Q2 goals but we suspect some or all of it will spill over to Q3.

From Mobile-Content-Service (Reading) we would like to see the following improvements in preference order:

  • T118306 [RB] Make image description pages on commons work
  • T39902 (dup T117519) Red links
  • T119265 More metadata
  • T118882 [RB] Language variant handling (zh-hans vs zh-hant)

[RB]: probably something that needs to be changed on the RESTBase level.

Here's some of the wishlist of reading web:

  • Stable shared heading id's with mediawiki parser.
  • First class sections
  • Extremely lean html endpoint.
    • Instead of the extremely verbose 2-way html-wikitext html, let's have a standard extremely lean html with all the transforms that Android content service is doing. (Minimal payload, fastest rendering)

I'll add more if I think of them.

  • Extremely lean html endpoint.
    • Instead of the extremely verbose 2-way html-wikitext html, let's have a standard extremely lean html with all the transforms that Android content service is doing. (Minimal payload, fastest rendering)

Note that this is basically T78676: Store & load data-mw separately and requires the VisualEditor (cc @Jdforrester-WMF, @Esanders) and Content Translation (cc @santhosh) to have functionality in place to load data-mw separately. So, us implementing this is blocked on VE and CX acquiring that functionality.

Note that this is basically T78676: Store & load data-mw separately and requires the VisualEditor (cc @Jdforrester-WMF, @Esanders) and Content Translation (cc @santhosh) to have functionality in place to load data-mw separately.

To make this more concrete, a basic implementation of this would

  • make another API request to /data-mw/{title}{/revision} to fetch the data-mw metadata, and
  • iterate through each id in the returned object, and add a data-mw attribute to the corresponding DOM node returned by getElementById.

Pseudo code:

// Assuming dataMw holding the parsed data-mw response, and 
// doc holding the DOM
Object.keys(dataMw).forEach(function(id) {
  var node = doc.getElementById(id);
  if (node) {
    node.dataset.mw = JSON.stringify(dataMw[id]);
  } else {
    throw new Error("Node corresponding to id " + id + " not found!");
  }
});
  • Extremely lean html endpoint.
    • Instead of the extremely verbose 2-way html-wikitext html, let's have a standard extremely lean html with all the transforms that Android content service is doing. (Minimal payload, fastest rendering)

Note that this is basically T78676: Store & load data-mw separately and requires the VisualEditor (cc @Jdforrester-WMF, @Esanders) and Content Translation (cc @santhosh) to have functionality in place to load data-mw separately. So, us implementing this is blocked on VE and CX acquiring that functionality.

@Esanders says they can get this functionality in place. @santhosh how about CX?

@ssastry, CX can also do this. The method @GWicke mentioned should work. We(Parsing, VE and CX) just need to coordinate on timeline of implementation and deployment.

I am assuming that the HTML->WIkitext conversion path is unaffected and it continues to accept HTML with data-mw in it.

  • Extremely lean html endpoint.
    • Instead of the extremely verbose 2-way html-wikitext html, let's have a standard extremely lean html with all the transforms that Android content service is doing. (Minimal payload, fastest rendering)

Note that this is basically T78676: Store & load data-mw separately and requires the VisualEditor (cc @Jdforrester-WMF, @Esanders) and Content Translation (cc @santhosh) to have functionality in place to load data-mw separately. So, us implementing this is blocked on VE and CX acquiring that functionality.

yup. It would be useful if we could at least invoke turning this off via an optional API parameter in the interim.

Here are our draft goals for this quarter:

  1. Leaner HTML by stripping data-mw and storing in separate bucket in RESTBase -- requires co-ordinating deploy with Services, CX, VE, OCG after code on their end is implemented to repopulate data-mw in the DOM
  1. Improved multimedia support in Parsoid -- RFCs go through ArchCOM; DOMSpec updated; updates to PHP parser + Parsoid
  1. Majority of the blockers for replacing Tidy identified and resolved -- mass visual diff testing infra in place; processes in place for fixing templates and pages affected by switch; PHP parser / Parsoid changes in place where necessary

Anything else we get done will be bonuses like pieces of T119265: More metadata in Parsoid output perhaps.

  • Extremely lean html endpoint.
    • Instead of the extremely verbose 2-way html-wikitext html, let's have a standard extremely lean html with all the transforms that Android content service is doing. (Minimal payload, fastest rendering)

Note that this is basically T78676: Store & load data-mw separately and requires the VisualEditor (cc @Jdforrester-WMF, @Esanders) and Content Translation (cc @santhosh) to have functionality in place to load data-mw separately. So, us implementing this is blocked on VE and CX acquiring that functionality.

yup. It would be useful if we could at least invoke turning this off via an optional API parameter in the interim.

Leaner HTML is one our goals for next quarter, so maybe better to get that done. But, separate from that, if you want an interim flag, this would have to be something that RESTBase provides.

ssastry claimed this task.