Session Themes and Topics
- Theme: Architecting our code for change and sustainability
- Topic: Parser and Wikitext
Session Leader
- Subbu Sastry (@ssastry)
Facilitator
- Kate Chapman
Description
This session is focused on ensuring we have the technical requirements for the parser understood as we continue the process of unifying the PHP parser and Parsoid. This includes identifying product directions around wikitext that may impact the requirements of the unified parser long-term.
Questions to answer during this session
Question | Significance: Why is this question important? What is blocked by it remaining unanswered? |
1. What is the product vision for visual editing and editing on mobile? If edits become majority visual vs source, how does this impact parser design? What other product goals are likely to impact the design of the parser and how will they do that? | If products are heading towards a WYSIWYG or a micro-edit experience for the majority of users it makes sense to evaluate the needs of the parser in that light. Answers to this question could guide how wikitext might evolve, in what ways, and what kind of tools the parser might need to support. |
2. What are the trade offs between unifying the parsers to a Node.js implementation vs a PHP implementation? | Prior to unifying the parser into PHP, we should ensure there are no use cases or reasons to keep the parser in JS like clients parsing in the browser or in apps. Additionally we should make sure any future needs for VE are accounted for before making this move. |
3. What are the impacts of parser speed on our technical infrastructure (specifically regarding storage)? What is a good goal for speed of the parser? What does it mean to be fast (returning HTML from storage is fast, but does it need to be fast when generating the HTML?)? Should we only be concerned with balanced templates so that we do not have to regenerate a whole page when content changes? | Speed of the parser has been mentioned in several contexts. It isn’t clear what is meant by this. Are engineers concerned with processor load when regenerating pages or are client engineers and PMs concerned with response time? Are we concerned with the worst case or median times? Is this a user concern or an infrastructure concern? Is the parser already fast enough? Unbalanced templates are known to be an issue here as well since they can modify the rest of the page. |
4. Should wikitext be the canonical storage for content in MediaWiki? What are the trade-offs between storing HTML vs Wikitext? | Does it make sense to store content as Wikitext if we are returning HTML to clients 99% of the time? Storing HTML seems to remove some of the burden off of the parser since we would only need to support converting to Wikitext when a user want to edit in WIkitext. |
5. Should having a deterministic/repeatable parser be a goal? Is it useful to have a concept of static vs dynamic templates? What are the advantages to doing this? What are the roadblocks to this? (Specifically discuss Wikitext, Templates, Lua modules) | Not having a deterministic parser has been identified as one of the major reasons to store edits for VE on the server. Is being able to guarantee most of the page stays the same actually get us any benefits? We know that dynamic content is possible in templates, but if we close them and contain that logic does it provide benefits? |
6. Do we want to evolve wikitext? If so, what aspects / shortcomings do we want to target? What are possible solutions for addressing them? What are the considerations we should factor into any such evolution path? | A number of challenges we now face in the parser and in our products are an outgrowth of wikitext and how it is processed. Certain editing, technology, and usability goals might be advanced / enabled by suitably updating wikitext. But, since this directly impacts editor workflows, this needs to be addressed carefully. |
Keep in mind:
- The questions proposed above are based on a synthesis of input the PC received about the content of the conference. So, even if answers to some questions might be obvious to some of you, the reason they are there is so that they can be explicitly answered, documented, and used to chart roadmaps without have to revisit them over and over again.
Facilitator and Scribe notes
Facilitator reminders
Session Structure
- Intro to session, questions, background, session structure (5 mins)
- Have product folks respond to Q1 and engage any questions: (5 mins - we can stretch this if required)
- For the other 5 qns (Q2 - Q6), we'll have posters set up around the room for each question (and pre-seeded with information we already have from our previous engagements & discussions) where participants can either add more notes by writing, sticking post-its or +1ing existing entries. This lets us get everyone's input in the most efficient way possible. (~20 mins).
- Regroup, summarize, and identify any points of agreement, contention, any new unanswered questions, and identify strategies for moving forward including, where possible, identifying who is responsible for that work. Depending on outcome, we might decide to strategize as a big group or split up into smaller groups. Process will be refined by the time we get to the day of this session. (~30 mins)
Resources:
- Parser Unification
- Moving Parsoid to Core
- Parsoid-PHP phabricator board
- T93715: [EPIC] Make Parsoid HTML output completely deterministic
- T112999: Let MediaWiki operate entirely without wikitext
- T114454: [RFC] Visual Templates: Authoring templates with Visual Editor
- T204375: Wikitext 2.0 as low-bandwidth transport for client-side rendering
- Parsing Team thoughts about wikitext 2.0
- Devsummit 2017 talk by Parsing team members about wikitext 2.0
Session Leaders please:
- Add more details to this task description.
- Coordinate any pre-event discussions (here on Phab, IRC, email, hangout, etc).
- Outline the plan for discussing this topic at the event.
- Optionally, include what it will not try to solve.
- Update this task with summaries of any pre-event discussions.
- Include ways for people not attending to be involved in discussions before the event and afterwards.
Post-event Summary:
- ...
Action items:
- ...