High-level Approach (How)
The goal of this project is to enable rich text formatting in Wikifunctions' output, specifically allowing basic HTML tags such as bold, italic, and underline, as well as hyperlinks. This will enhance the usability and expressiveness of the content generated by Wikifunctions.
Approach Overview:
- Define User-Relevant Outcome: The first step is to determine a clear user-facing result for this Proof of Concept (POC). This includes defining a sentence or such with specific text styles like bold, italic, underline, and incorporating links in the outputs that are relevant to Wikipedia users. Can we define a user-relevant end result that we want this POC to deliver? ❓Decision on output: TBD
- Z89 Function Model: We will design a function model for Z89/HTML Fragment will return HTML strings wrapped in a Z89 type. They will be rendered on both the Wikifunctions platform and within Wikipedia articles.
- Sanitization: To avoid the risks of unsafe HTML, we will sanitize all generated HTML at the PHP level using the MediaWiki Sanitizer. This ensures that only allowed HTML tags are used. This step will prevent malicious code from being included in outputs. Open for discussion is still where people can link to. Can people link to any website (risk of inappropriate links) or only to (internal) Wiki pages? And how about links in Wikidata items (in Identifiers for example)? ❓Decision on links: TBD
- Frontend Rendering in Wikifunctions: A dedicated frontend component will be created to safely render Z89/HTML Fragment in the Wikifunctions interface. This component will be similar to some existing components, but tailored for HTML output.
- Rendering in Wikipedia Articles: After ensuring the Z89 HTML output works within Wikifunctions, we will modify the system so that these HTML fragments are properly rendered in Wikipedia articles. This will require collaboration with the Content Transform Team, particularly for Parsoid, which is responsible for transforming content between wikitext and HTML.
- [Optional] User Guidance and Documentation: Clear documentation and UI hints will be provided to guide users on how to apply rich text formatting using HTML tags. The documentation will help them understand which tags are supported and how to use them effectively.
- Testing and Validation: Thorough testing will be conducted to ensure the functionality works as expected, including security testing (to prevent XSS attacks), usability testing, and performance testing (to ensure rendering doesn't affect page load times).
Acceptance Criteria
- Formatted Text Output: Functions must support outputting text that includes basic formatting—bold (<strong> or <b>), italic (<em> or <i>), and underline (<u>)—and ensure this formatting is correctly displayed both within Wikifunctions and when transcluded into Wikipedia articles. The system should allow for both semantic (<strong>, <em>) and presentational (<b>, <i>) tags, leaving the choice to the community. But we should probably support whatever the MediaWiki Sanitizer allows us to insert.
- Hyperlink Support: Users can include functional hyperlinks (<a href="...">) in Wikifunctions string outputs.
- HTML Sanitization: Any HTML content in the function output must be properly sanitized to prevent the injection of malicious or unsafe code, ensuring the safety of both Wikifunctions and Wikipedia.
- [Stretch Goal] User Guidance for Rich Text: Users must be provided with clear instructions, UI prompts, or documentation that guides them on how to include rich text formatting in their function outputs.
Learnings from the POC
Some things we noted as out of scope for the POC but might be worth considering for the next iteration are listed below.
Sanitization
- Perhaps we should sanitize before sending any unsafe html to the database. If we would access a ZObject from outside of Visual Editor we can now potentially get back unsafe HTML.
- links use <a> are not allowed as html in Wikitext. People need to use the wikitext [https://... label] (external link syntax) and [[...]] for internal links. But because we are inserting an HTML fragment the Wikitext is also not rendered. which gives us currently no way to include links. To improve.
Visual Editor
- Z89 is currently not used yet in the POC as an input argument to functions in Visual Editor, only as output.
- The Preview will show the html as a string not in a nice ace editor field. For the POC we do not touch this. But its nicer to have something better, especially for long html.
Frontend/Vue
- Guide the user even better in inserting valid HTML:
- User input in code editor validation async on blur (earlier error messages than in publish dialog)
- I currently warn the user about unclosed tags and invalid tags and such as mentioned above but only when ace editor is in HTML mode. You can obviously also return html like this in JS code implementation:
Currently we don't warn the user that the script is not allowed here.
return {
"Z1K1": "Z89",
"Z89K1": `<strong>${Z23781K1}</strong><script>setTimeout(function(){window.alert('I killed visual editor')},1000);</script>`
}- There is no preview of what the outputted HTML would look like rendering, not in Wikifunctions nor in the Preview of VisualEditor
- What is tricky is indentation. If you try to make it look good in JS it might add extra /t tags or such and indent the result in a strange way. Its valuable to fix this.
- Design should spend some time on this. The POC just uses whatever we have out of the box. UX could potentially be improved for this. See initial design ticket
Builtins
- equality function now just checks 1.Z89K1 === 2.Z89K2. But with html two html strings can be valid but differently indented.
// this equals <strong><i>equal</i></strong> // this <strong> <i>equal</i> </strong>
- validator is an builtin empty validator.
