As a Wikisource user, I want the team to investigate sing Parsoid HTML for Wsexport, so it can be determined if a) such a change would improve reliability to a meaningful degree, and b) if the work would be manageable and within scope for the team.
Background: WSexport is currently using the MediaWiki parser HTML output using ?action=render to generate its ePubs. This was the only HTML provided when the tool has been created. However, Parsoid HTML is now available and provides much richer data. It might be relevant to migrate to Parsoid HTML to make the tool more "future proof" and hopefully simplify some HTML transformations and cleanups (footnotes, mathematical formulas...).
Acceptance Criteria:
- Investigate the primary work that would need to be done in order to use Parsoid HTML for Wsexport
- Provide a general rundown of pros/cons of using Parsoid HTML for Wsexport
- Investigate the main challenges, risks, and possible dependencies associated with implementing such a change
- Provide a general estimate/idea, if possible, of the potential impact it may have on ebook export reliability.
- In other words, do we have a strong hunch that this could, indeed, improve reliability (and in a considerable way)? Why or why not?
- Provide a general estimation/rough sense of the level of difficulty of effort required in doing such work