Page MenuHomePhabricator

Wikimedia Technical Conference 2018 Session - Identifying the requirements and goals for the parser
Closed, ResolvedPublic

Description

Session Themes and Topics

  • Theme: Architecting our code for change and sustainability
  • Topic: Parser and Wikitext

Session Leader

Facilitator

  • Kate Chapman

Description

This session is focused on ensuring we have the technical requirements for the parser understood as we continue the process of unifying the PHP parser and Parsoid. This includes identifying product directions around wikitext that may impact the requirements of the unified parser long-term.

Questions to answer during this session

QuestionSignificance: Why is this question important? What is blocked by it remaining unanswered?
1. What is the product vision for visual editing and editing on mobile? If edits become majority visual vs source, how does this impact parser design? What other product goals are likely to impact the design of the parser and how will they do that?If products are heading towards a WYSIWYG or a micro-edit experience for the majority of users it makes sense to evaluate the needs of the parser in that light. Answers to this question could guide how wikitext might evolve, in what ways, and what kind of tools the parser might need to support.
2. What are the trade offs between unifying the parsers to a Node.js implementation vs a PHP implementation?Prior to unifying the parser into PHP, we should ensure there are no use cases or reasons to keep the parser in JS like clients parsing in the browser or in apps. Additionally we should make sure any future needs for VE are accounted for before making this move.
3. What are the impacts of parser speed on our technical infrastructure (specifically regarding storage)? What is a good goal for speed of the parser? What does it mean to be fast (returning HTML from storage is fast, but does it need to be fast when generating the HTML?)? Should we only be concerned with balanced templates so that we do not have to regenerate a whole page when content changes?Speed of the parser has been mentioned in several contexts. It isn’t clear what is meant by this. Are engineers concerned with processor load when regenerating pages or are client engineers and PMs concerned with response time? Are we concerned with the worst case or median times? Is this a user concern or an infrastructure concern? Is the parser already fast enough? Unbalanced templates are known to be an issue here as well since they can modify the rest of the page.
4. Should wikitext be the canonical storage for content in MediaWiki? What are the trade-offs between storing HTML vs Wikitext?Does it make sense to store content as Wikitext if we are returning HTML to clients 99% of the time? Storing HTML seems to remove some of the burden off of the parser since we would only need to support converting to Wikitext when a user want to edit in WIkitext.
5. Should having a deterministic/repeatable parser be a goal? Is it useful to have a concept of static vs dynamic templates? What are the advantages to doing this? What are the roadblocks to this? (Specifically discuss Wikitext, Templates, Lua modules)Not having a deterministic parser has been identified as one of the major reasons to store edits for VE on the server. Is being able to guarantee most of the page stays the same actually get us any benefits? We know that dynamic content is possible in templates, but if we close them and contain that logic does it provide benefits?
6. Do we want to evolve wikitext? If so, what aspects / shortcomings do we want to target? What are possible solutions for addressing them? What are the considerations we should factor into any such evolution path?A number of challenges we now face in the parser and in our products are an outgrowth of wikitext and how it is processed. Certain editing, technology, and usability goals might be advanced / enabled by suitably updating wikitext. But, since this directly impacts editor workflows, this needs to be addressed carefully.

Keep in mind:

  • The questions proposed above are based on a synthesis of input the PC received about the content of the conference. So, even if answers to some questions might be obvious to some of you, the reason they are there is so that they can be explicitly answered, documented, and used to chart roadmaps without have to revisit them over and over again.

Facilitator and Scribe notes

Facilitator reminders

Session Structure

  • Intro to session, questions, background, session structure (5 mins)
  • Have product folks respond to Q1 and engage any questions: (5 mins - we can stretch this if required)
  • For the other 5 qns (Q2 - Q6), we'll have posters set up around the room for each question (and pre-seeded with information we already have from our previous engagements & discussions) where participants can either add more notes by writing, sticking post-its or +1ing existing entries. This lets us get everyone's input in the most efficient way possible. (~20 mins).
  • Regroup, summarize, and identify any points of agreement, contention, any new unanswered questions, and identify strategies for moving forward including, where possible, identifying who is responsible for that work. Depending on outcome, we might decide to strategize as a big group or split up into smaller groups. Process will be refined by the time we get to the day of this session. (~30 mins)

Resources:


Session Leaders please:

  • Add more details to this task description.
  • Coordinate any pre-event discussions (here on Phab, IRC, email, hangout, etc).
  • Outline the plan for discussing this topic at the event.
  • Optionally, include what it will not try to solve.
  • Update this task with summaries of any pre-event discussions.
  • Include ways for people not attending to be involved in discussions before the event and afterwards.

Post-event Summary:

  • ...

Action items:

  • ...

Event Timeline

kchapman renamed this task from Wikimedia Technical Conference 2018 Session - What are goals for the parser? (can we run offline) to Wikimedia Technical Conference 2018 Session - Identifying the requirements and goals for the parser.Oct 3 2018, 1:20 AM
debt added a subscriber: ssastry.
debt updated the task description. (Show Details)
debt edited subscribers, added: kchapman; removed: ssastry.

Re the first question, I feel the "significance" section presupposes a particular answer (ie, that WYSIWYG editors aren't appropriate for templates, and so wikitext editing should be limited). I don't agree (T114454), but regardless, here's my attempt at rephrasing the prompt in a neutral manner:

"If products are heading toward a WYSIWYG (or micro-edit, or...) experience, does that limit the tasks we can accomplish? Should we try to narrow or broaden the abilities of our WYSIWYG (or micro-edit, or...) tools? If wikitext is a tool for a subset of editors, does that affect goals for the parser? Conversely, if WYSIWYG is intended to be a complete editing experience, what parser limitations do we need to lift?"

Re question #3 wrt JS-vs-PHP: it's important to understand the role of the social ecosystem involved. One reason why markdown is widespread (and wikitext is not used at all outside the WMF environment) is that we've never had a really good/fast standalone parser. Even our own WMF research department uses mwparserfromhell (written in Python) instead of either of the two "official" WMF parsers. (And they probably won't migrate to an official parser even once/if Parsoid is in PHP.)

One way forward is to encourage fast reliable access to already-parsed article content; that's what the RESTBase API and the HTML-based DOM format was intended to do. (But it's not enough for ORES, discuss...)

Another is what we broadly discuss as "wikitext 2.0": gradually modernizing and simplifying wikitext to the point where third-party implementations are "easy" and "fast". (That's potentially a long road...)

A third is "zero parsers in core": trying to decouple MediaWiki from the specific "wikitext" article representation, to make it possible to use "legacy wikitext", "wikitext 2.0", markdown, HTML-native, or any other format. We can potentially have reasonable round-trip conversions between these, and the editors can be decoupled entirely from the storage format. (If you can't beat them, join them...)

These issues are fundamentally to the social issue of "what parser do people use for wikitext and how it is written", but have very little to do with the technical merits of one programming language or environment or another.

Re the first question, I feel the "significance" section presupposes a particular answer (ie, that WYSIWYG editors aren't appropriate for templates, and so wikitext editing should be limited). I don't agree (T114454), but regardless, here's my attempt at rephrasing the prompt in a neutral manner:

"If products are heading toward a WYSIWYG (or micro-edit, or...) experience, does that limit the tasks we can accomplish? Should we try to narrow or broaden the abilities of our WYSIWYG (or micro-edit, or...) tools? If wikitext is a tool for a subset of editors, does that affect goals for the parser? Conversely, if WYSIWYG is intended to be a complete editing experience, what parser limitations do we need to lift?"

Thanks for pointing out that the prompt needed changing. Based on your comment here and after pondering it more, I tweaked it differently that hopefully captures the spirit of your comment here.

ssastry updated the task description. (Show Details)