Page MenuHomePhabricator

Write a script to automatically migrate Flow boards to sub-pages
Closed, ResolvedPublic

Description

A key step in the Flow sunsetting sequence we are defining in T370722 will be automatically migrating Flow boards to sub-pages that volunteers didn't already move themselves.

In this task, we'll investigate what approach(es) we might take for the "automatic migrating" described above, and ultimately, converge on and implement, an approach.

Migration requirements

Note: any approach will need to consider/account for T371769 (and the guidance @Urbanecm_WMF shared in T371769#10135854.

Requirements

The scripts needs to do the following, for any given project:

  1. Identify all of the non-subpages where Flow is being used on the wiki
  2. Move said Flow-using non-subpages to subpages using the following naming convention: Namespace:Page Name/Flow
  3. Once script executes steps "1." and "2.", run the fix inconsistent boards maintenance script and purge the workflow cache
  4. Output a log that shows:
    • All of the non-subpages where Flow was being used on the wiki
    • Names of the sub-pages where each of these Flow-using non-subpages were moved to
      • Note: in cases where the automatic moved failed, note this so that we can move these pages manually.

Open questions

  • 1. What do we expect to happen if/when a Flow-using non-subpage has sub-pages "beneath" it? Do we want them to be moved, do we want them to be left where they are?
    • To make this choice, we will:
      • 1) Gather a list of all of the Flow-using non-subpages at Phase 0 wikis | @VPuffetMichel
      • 2) Manually review the sub-page of these "Flow-using non-subpages" | @Trizek-WMF
      • 3) Answer this question | Team

Approaches

This section includes the range of approach(es) we could consider taking for automatically migrating Flow boards to sub-pages that volunteers didn't already move themselves.

  • Approach #1
    • Description
    • Technical uncertainties/complexities
  • Approach #2
    • Description
    • Technical uncertainties/complexities
  • Approach #3
    • Description
    • Technical uncertainties/complexities

Done

  1. The approaches we could take for automatically migrating Flow boards to sub-pages that volunteers didn't already move themselves are documented in the === Approaches section above
  2. A decision is made about what approach we'll move forward with
  3. The approach we converged on is implemented and functional

Related Objects

Event Timeline

ppelberg renamed this task from [SPIKE] Investigate Flow migration script to [SPIKE] Investigate Flow automatic migration approaches.Aug 2 2024, 10:40 PM

Sorry, moved this on the wrong board.

ppelberg renamed this task from [SPIKE] Investigate Flow automatic migration approaches to Write a script to automatically migrate Flow boards to sub-pages.Oct 8 2024, 6:33 PM
ppelberg updated the task description. (Show Details)
ppelberg updated the task description. (Show Details)
ppelberg moved this task from Inbox to Ready to Be Worked On on the Editing-team (Kanban Board) board.
ppelberg added a project: Goal.

General notes:

  • At least from my experience on MediaWiki.org there are a non-trivial number of Flow boards with zero topics and an empty or no header. It's probably better to delete those outright rather than moving them to subpages.
  • Another common situation is where there are zero topics and the only content in the header is a transclusion of {{LQT page converted to Flow}}. In that case the Flow board can likewise just be deleted, and the Flow board /LQT Archive 1 page can also be deleted.
  • Would suggest using /Flow Archive rather than /Flow, because that's what the Beta feature does when you deactivate it. In theory, anyway.
  • Be careful what "Non-subpage" means. https://www.mediawiki.org/wiki/VisualEditor/Feedback is technically a subpage but should be included in this process.
  • Are you planning to run convertToText.php or something like it? None of that is mentioned here, and if you don't the result is that the Flow discussions will disappear into the Ether entirely, which probably isn't what you want.
  • See some old notes at https://wikitech.wikimedia.org/wiki/Flow

General notes:

Thank you for sharing this feedback, @Pppery. Some follow-up questions in-line below...

  • At least from my experience on MediaWiki.org there are a non-trivial number of Flow boards with zero topics and an empty or no header. It's probably better to delete those outright rather than moving them to subpages.

Can you say a bit more here? What – if any – consequences/complications can you see resulting from archiving empty boards?

Parallel question for @DLynch: what – if any – complexity would be involved with doing as @Pppery proposed above: deleting (rather than archiving) Flow boards with zero activity on them?

  • Another common situation is where there are zero topics and the only content in the header is a transclusion of {{LQT page converted to Flow}}. In that case the Flow board can likewise just be deleted, and the Flow board /LQT Archive 1 page can also be deleted.

Similar to the question about: what – if any – consequences/complications can you see resulting from archiving boards that are empty except for a transcluded header?

  • Would suggest using /Flow Archive rather than /Flow, because that's what the Beta feature does when you deactivate it. In theory, anyway.

Good spot and this sounds good to me. Although, before revising the requirements, can you please share what "Beta feature" you're referring to here?

Mmm, I see. What do you think might be a more apt way of framing this requirement?

  • Are you planning to run convertToText.php or something like it? None of that is mentioned here, and if you don't the result is that the Flow discussions will disappear into the Ether entirely, which probably isn't what you want.

cc @DLynch

Funny coincidence: the Editing Team (the team responsible for the visual editor) is the team (of which I'm a part) that will be taking on this deprecation work.

Thank you!

Can you say a bit more here? What – if any – consequences/complications can you see resulting from archiving empty boards?

Nothing, I guess. At some point when Flow is uninstalled all Flow boards will have to be deleted (or else they will probably flow an exception when they are viewed). ConvertToText on an empty board will produce a blank wikitext page presumably. And then the wiki will have a blank page as a talk page. I'll probably run an adminbot to delete the leftover blank pages on MediaWiki.org, but I figured it would be cleaner to delete them automatically rather than going around that circutuous route.

Similar to the question about: what – if any – consequences/complications can you see resulting from archiving boards that are empty except for a transcluded header?

The same here - a bunch of pointless leftover pages will be left behind. The Flow team of the past created this situation, it should be their responsibility to clean it up, not mine.

Good spot and this sounds good to me. Although, before revising the requirements, can you please share what "Beta feature" you're referring to here?

I'm referring to the "Structured Discussions on user talk page" Beta feature. The same beta feature that was mostly disabled in T248309 but users who have it on can still turn it off, which would in theory move their talk page to /Flow archive.

In practice it breaks the world instead, as https://www.mediawiki.org/wiki/Talk:Structured_Discussions/Deprecation#%22Fatal_exception_of_type_'Flow\Exception\InvalidDataException'%22 shows.

Although if Flow is going to be uninstalled soon it hardly matters.

Funny coincidence: the Editing Team (the team responsible for the visual editor) is the team (of which I'm a part) that will be taking on this deprecation work.

I think I knew that in theory. Still, you-as-VisualEditor maintainers need to make sure that you-as-Flow-maintainers don't cause problems with the VisualEditor feedback feature. Comments to T224851 suggest it will probably work anyway, but I would be sure.

It would also be nice to delete the templates created in T111098 too if they are now unused. But I'm just ranting at this point.

See also T332103: Research Spike: Test Flow -> wikitext conversion, where I did a review of the old Flow/convertToText.php script that was prototyped in 2014 but never used, in case that's used as basis here.

I've used convertToText.php a handful of times in the last six months to convert a few Flow boards, that I watch, to wikitext in order to be able to use DiscussionTools. I've got a number of local hacks, […].

The three main problems I see:

  1. No indentation for replies.
  1. Once you add indentation, you inevitably break a ton of content because multi-line wikitext has historically not been supported inside :. This is understandable, given that this wikitext syntax was created for one-line dictionary definitions, not discussions or other free-form content. The most common breakage is tables, <pre>, and <syntaxhighlight>. Flow led the way here by treating each reply as its own "page". I've fixed these different ways at different times. Sometimes by outdenting the reply back to the far-left as essentially a new thread. Other times, when it's a very simple table, by manually rewriting it as a bullet list. Leaving some replies outdented might be the safest fallback.
  1. No revision history, and thus no way to discover discussions from user contributions, and no transparency that replies haven't been meddled with. The maintenance script could address this by running its logic repeatedly to build up the page, instead of once for the page as a whole. That way each reply can be saved with the original revision author and timestamp preserved. This is especially important for Flow, because its revision format and content model are completely alien to MediaWiki core, so once the extension is gone, the Flow pages will become inaccessible (might as well be deleted). This means A) unable to find what an account did at a certain time which is essential to rediscover discussions that relate to other edits around the same time, and B) unable to verify when on a talk page that the reply is genuine and not (un)intentionally altered by others.

Example, after manual touch-ups: https://www.mediawiki.org/wiki/Talk:Snippets/Auto-number_headings

I believe point #3 above is the most pertinent for long-term sustainability. Without it, Flow discussions would effectively disappear from user contributions completely (after Flow is uninstalled).

If we go with an iterative approach where each section and reply is incrementally exported, that would solve two problems at once. It would solve the above, whilst also naturally offering a point in time to check page size (the issue @Pppery raises above about maximum page size). In Flow, the maximum page size applied at the level of a single reply rather than the entire discussion page. If incrementally adding a thread reaches above page size, you can close Archive_1 and start again on Archive_2 with the next section or something like that.

See also T96301, T90075#1937071

(convertToText.php was used at least in 2016 on enwiki. Not sure if it was used any other times between 2016 and your uses)

There's a long and ugly backstory of broken promises here: https://en.wikipedia.org/wiki/Wikipedia_talk:Flow/Archive_15#Conversion_back_from_Flow,_or_another_set_of_false_WMF_promises, but since Flow is already on its way out there's no point in unearthing more old history.

The iterative approach sounds promising to reproduce some form of history. Even if the formatting is somewhat off in the conversion, it would preserve which comments were submitted by the same author, which might make it more fixable. I do hope that edits to Flow posts don't throw things off too badly with this approach.

My own interest in this topic comes from Miraheze, where we have 275 wikis still using Flow. The general consensus from both editors and tech is that we're ready to move away from Flow, and we've locked new wikis from enabling it. But we're waiting for a good conversion script before we start disabling it, even for the wikis that want to move away already. We're very thankful for the work that you've been doing.

I would definitely prefer to avoid having empty boards making empty pages. If that means vandalism is permanently forgotten, that's something I can live with. Our wikis are very unlikely to run into the problem of the page size limit, however.

Hi @Krinkle and everyone on this task,
We are looking at a 3 phases approach to deprecate Flow. This is only the first phase "make flow boards read only".

Converting the content to wikitext will be phase 2.
Phase 3 will be to remove all code etc.

We have not planned when we will work on phase 2 and 3. We are now focusing on phase 1. Just sharing the broader context.

@VPuffetMichel Thanks, that makes sense. If this is outlined somewhere on-wiki or in a parent task, that might be worth linking to from the task description.

Parent T332022, looks like the most relevant place to place this, and that does describe a detailed plan, but a different plan than the new 3-phases approach.

@VPuffetMichel Thanks, that makes sense. If this is outlined somewhere on-wiki or in a parent task, that might be worth linking to from the task description.

@Krinkle: great spot and agreed.

Parent T332022, looks like the most relevant place to place this, and that does describe a detailed plan, but a different plan than the new 3-phases approach.

I've taken a first pass at updating T332022 to reflect the 3-phased approach in T332022#10223080.

How do these edits look to you? Are there questions/ambiguities this plan creates that you think would be worth us addressing?

Change #1079804 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/Flow@master] Add maintenance script to move all flow boards on a wiki to a subpage

https://gerrit.wikimedia.org/r/1079804

Note: For user talk page, since the wikitext user talk page is moved to a subpage when converted to Flow, it should be moved back. In additional when it is activited as a user preference, such user preference should be turned off.

For other talk pages, when the flow talk page is moved to a subpage, a new wikitext talk page should be created in its original place with a link of the archived Flow board.

Note: For user talk page, since the wikitext user talk page is moved to a subpage when converted to Flow, it should be moved back.

My advice it to let the user decide, as the former wikitext talk page can be very outdated. For some users, Flow was their main talk page system for 9 years.

In additional when it is activited as a user preference, such user preference should be turned off.

You mean activating Flow using the Beta features page? This is not possible anymore per T248309.

For other talk pages, when the flow talk page is moved to a subpage, a new wikitext talk page should be created in its original place with a link of the archived Flow board.

It is the case, in a certain way: there is the empty talk page placeholder that invites to create a new discussion.

What we should add is a link to the Flow board moved to a sub-page, so that users can find back the discussion.

You mean activating Flow using the Beta features page? This is not possible anymore per T248309.

I think what bugreporter is saying is to make sure that the user_options database marks the beta feature as disabled.

You mean activating Flow using the Beta features page? This is not possible anymore per T248309.

I think what bugreporter is saying is to make sure that the user_options database marks the beta feature as disabled.

And once no one is using such Beta Feature, the beta feature should be removed completely from that wiki (see $wmgFlowEnableOptInBetaFeature).

My advice it to let the user decide, as the former wikitext talk page can be very outdated. For some users, Flow was their main talk page system for 9 years.

Currently if Flow is used as Beta Feature and user turn off it (no matter when it is turned on), the old archived talk page is moved back. An mass conversion script should keep such behavior for consistency.

(I've been on vacation since technically a bit before I put that patch up. I'm catching up on what was said now.)

It is the case, in a certain way: there is the empty talk page placeholder that invites to create a new discussion.

What we should add is a link to the Flow board moved to a sub-page, so that users can find back the discussion.

Worth noting that these are mutually exclusive -- if we add any content at all on the page, it won't get the empty-page placeholder.

EDIT: actually, I was wrong there. I forgot we changed that so a page with no topics still gets the empty state.

My advice it to let the user decide, as the former wikitext talk page can be very outdated. For some users, Flow was their main talk page system for 9 years.

Currently if Flow is used as Beta Feature and user turn off it (no matter when it is turned on), the old archived talk page is moved back. An mass conversion script should keep such behavior for consistency.

I know. :) However, having this as an available feature doesn't mean it is a good idea to use or mimic it.

I reason that this "unarchive" feature was created for short test periods: you turn Flow on, you see if it is a fit for you, and if not, you have an easy way to find your talk page back, even with a break of several days/weeks. This wasn't really designed for a gap that spans across years.

Current Flow board activations are several years old now, as we disallowed Flow board creation a long ago. Now, you can find can be active boards, or abandoned boards (one case could be: that a user creates an account, plays with their preferences, and never returns).

Instead of restoring the old talk page, I would create links to the subpages found when the move is done. Then, it is up to the user to triage them and restore any custom presentation they had, if they want to.

Change #1079804 merged by jenkins-bot:

[mediawiki/extensions/Flow@master] Add maintenance script to move all flow boards on a wiki to a subpage

https://gerrit.wikimedia.org/r/1079804

Change #1082791 had a related patch set uploaded (by Urbanecm; author: DLynch):

[mediawiki/extensions/Flow@wmf/1.43.0-wmf.28] Add maintenance script to move all flow boards on a wiki to a subpage

https://gerrit.wikimedia.org/r/1082791

Change #1082792 had a related patch set uploaded (by Urbanecm; author: DLynch):

[mediawiki/extensions/Flow@wmf/1.43.0-wmf.27] Add maintenance script to move all flow boards on a wiki to a subpage

https://gerrit.wikimedia.org/r/1082792

Change #1082791 merged by jenkins-bot:

[mediawiki/extensions/Flow@wmf/1.43.0-wmf.28] Add maintenance script to move all flow boards on a wiki to a subpage

https://gerrit.wikimedia.org/r/1082791

Change #1082792 merged by jenkins-bot:

[mediawiki/extensions/Flow@wmf/1.43.0-wmf.27] Add maintenance script to move all flow boards on a wiki to a subpage

https://gerrit.wikimedia.org/r/1082792

Mentioned in SAL (#wikimedia-operations) [2024-10-24T14:15:29Z] <urbanecm@deploy2002> Started scap sync-world: Backport for [[gerrit:1082791|Add maintenance script to move all flow boards on a wiki to a subpage (T371738)]], [[gerrit:1082792|Add maintenance script to move all flow boards on a wiki to a subpage (T371738)]]

Mentioned in SAL (#wikimedia-operations) [2024-10-24T14:22:57Z] <urbanecm@deploy2002> Finished scap sync-world: Backport for [[gerrit:1082791|Add maintenance script to move all flow boards on a wiki to a subpage (T371738)]], [[gerrit:1082792|Add maintenance script to move all flow boards on a wiki to a subpage (T371738)]] (duration: 07m 28s)

QA is happening on the ticket for running the script: T376749

I poked around at the code of Flow wondering how easy it would be to write a patch to do the suggestions I gave above (delete rather than archive some useless stuff). What I found was that the codebase is written so abstractly that it's hard to even write "does this board have any topics" in a coherent way.

Which I guess proves you write in undeploying it and it being so hard so maintain.

Postscript - I ended up running a bot on MediaWiki.org to delete the classes of pages I suggested could be deleted in T371738#10212241, which made up ~7000 of the ~17000 MediaWiki.org flow pages.