Page MenuHomePhabricator

Research Spike: Test Flow -> wikitext conversion
Open, Needs TriagePublic

Description

Flow has a convertToText.php script which can provide the wikitext equivalent of a Flow page. This hasn't been used much; I think the last time it was touched was in 2016 (T90075: Update and retest the convertToText script). We should try it on some complex pages and see how well it works.

Some things to look for:

  • complex markup (if we can find or create a suitable test page)
  • handling of hidden/deleted/suppressed comments/topics
  • huge pages (e.g. mw:Support or the frwiki Teahouse)
  • something with a lot of indentation / response levels
    • complex markup when indented
  • can we get an estimate of time needed per comment?

Event Timeline

Note some flow talk pages have too many discussions that can not be converted to one single page. We need to split it, or (though not everyone support) T321716: MediaWiki discussions on individual pages.

Change #1057420 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/Flow@master] maintenance: Fix broken addOption() calls in convertToText.php

https://gerrit.wikimedia.org/r/1057420

I've used convertToText.php a handful of times in the last six months to convert a few Flow boards, that I watch, to wikitext in order to be able to use DiscussionTools. I've got a number of local hacks, the most fatal of which is the above, preventing it from even starting to execute due to an incorrect Maintenance::addOption() call.

The three main problems I see:

  1. No indentation for replies.
  1. Once you add indentation, you inevitably break a ton of content because multi-line wikitext has historically not been supported inside :. This is understandable, given that this wikitext syntax was created for one-line dictionary definitions, not discussions or other free-form content. The most common breakage is tables, <pre>, and <syntaxhighlight>. Flow led the way here by treating each reply as its own "page". I've fixed these different ways at different times. Sometimes by outdenting the reply back to the far-left as essentially a new thread. Other times, when it's a very simple table, by manually rewriting it as a bullet list. Leaving some replies outdented might be the safest fallback.
  1. No revision history, and thus no way to discover discussions from user contributions, and no transparency that replies haven't been meddled with. The maintenance script could address this by running its logic repeatedly to build up the page, instead of once for the page as a whole. That way each reply can be saved with the original revision author and timestamp preserved. This is especially important for Flow, because its revision format and content model are completely alien to MediaWiki core, so once the extension is gone, the Flow pages will become inaccessible (might as well be deleted). This means A) unable to find what an account did at a certain time which is essential to rediscover discussions that relate to other edits around the same time, and B) unable to verify when on a talk page that the reply is genuine and not (un)intentionally altered by others.

Example, after manual touch-ups: https://www.mediawiki.org/wiki/Talk:Snippets/Auto-number_headings

Thanks for sharing those results!

IMO the reasonable objective is to archive talk pages during conversion, which means the conversion result should be easy to understand visually, but not necessarily easy to understand as wikitext. Then indentation then could be done via HTML markup which is much more robust than wikitext indents.

Wrt page history, I don't know if it's worth the effort? I think talk page and user talk page histories are not used much (archives of project talk pages like village pumps are much more important, not sure if any wiki used Flow for that?), and converting each revision rather than each page would massively increase complexity and runtime. The script could just add a small warning that the comment was edited by others, much like it's shown by Flow itself.

Change #1057420 merged by jenkins-bot:

[mediawiki/extensions/Flow@master] maintenance: Fix broken addOption() calls in convertToText.php

https://gerrit.wikimedia.org/r/1057420