Page MenuHomePhabricator

Run Flow migration script at *gomwiki*
Closed, ResolvedPublic

Description

In T380911, we moved Flow pages at Phase 2b wikis to sub-pages.

This task involves the work of doing the same at gomwiki with two adjustments:

  1. Archive-destination pagename: They want "/Archive [number]" instead of "/Flow"
  2. Deletion instead: 668 of the 833 Flow boards on gomwiki are completely empty - can we please delete those instead? - See listing in P71749

Deployment timing

Wednesday, April 23, 2025

Execution

Dry run: https://phabricator.wikimedia.org/P75325
Full run: https://phabricator.wikimedia.org/P75328
Dry run 2 (to see what's left over): https://phabricator.wikimedia.org/P75329

Related Objects

Event Timeline

ppelberg renamed this task from Run Flow migration script at *gomwiki* wikis to Run Flow migration script at *gomwiki* wiki.Mar 18 2025, 7:16 PM
ppelberg assigned this task to zoe.
Pppery renamed this task from Run Flow migration script at *gomwiki* wiki to Run Flow migration script at *gomwiki*.Mar 18 2025, 7:21 PM

I think I've worked out the deleting half of this puzzle as there's an API endpoint for it, but I'd like to verify that the list of pages is up-to-date.

I'm looking at feeding this list into the API and checking that the content model, page length and 'new' flag are consistent with empty pages, but if you have a more straightforward way to re-generate or validate this list of pages I'd appreciate it.

The code I used to generate the list of pages to delete, in case anyone finds it useful: https://public-paws.wmcloud.org/User:Pppery/WMF%20Cleanup/Flow.ipynb

Note that core thinks all Flow pages have only one byte of content, and only one edit from Flow talk page manager. You have to use the Flow-specific APIs to interface with Flow boards.

I am so glad I asked... I would have absolutely tripped over that one way or another. Thank you!

Next steps
Per today's offline discussion, we're going to do the following (ordered first to last):

  1. Archive all pages (including those that were detected as being empty in January 2025)
    • This archiving is scheduled for Wednesday, April 23, 2025
  2. Defer to volunteers to delete empty Flow pages at gom.wiki
  3. Set Flow pages to read-only (T380909)

Thinking
We converged on the path forward โ€“ notably, the choice NOT to delete purportedly empty Flow page before running the archiving script โ€“ after an investigation did not produce a viable way of scalably confirming Flow pages marked as empty ~3 months ago are, in fact, still empty.

Oh, well.

If the community still wants those pages deleted they're welcome to follow local processes to give temporary admin and bot rights to https://gom.wikipedia.org/wiki/User:Flow_cleanup_bot and I'll handle it.

Sorry, pppery, I put a couple of days into it but I'm not familiar enough with toolforge or this corner of the API to do this quickly and with confidence. I appreciate the help with the example code.

Last time we did this, pppery pointed me at a script to fix inconsistent boards.

mwscript-k8s --comment="T389247 fix boards (dry run)" -f -- Flow:FlowFixInconsistentBoards --wiki=gomwiki --dry-run | tee >(phaste)

Here's the dry run: https://phabricator.wikimedia.org/P75330

That's most of them.

A quick check suggests that these are empty boards, although the majority of them would be anyway. I might speculate that either they're empty because they have no workflow ID (and perhaps adding discussion never worked) or that they would gain a workflow ID by being edited โ€“ so it looks like perhaps getting those empty ones cleaned up is back on the menu.

Rather than trying to build out a new script (something I've spent plenty of time banging my head off already), I think it would make sense to modify the existing script to add an option to delete empty Flow boards when encountered, rather than migrating them, and to hope that that's sufficient to get past these errors. I'll take a stab at better error handling so we can at least run the rest of the script to completion, but if it's a pain I'll do these in two halves so I can see if deletion helps.

For a random example:

https://gom.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%B6%E0%A5%87%E0%A4%B6:ApiSandbox#action=query&format=json&prop=revisions&titles=%E0%A4%B5%E0%A4%BE%E0%A4%AA%E0%A4%B0%E0%A4%AA%E0%A5%80%20%E0%A4%9A%E0%A4%B0%E0%A5%8D%E0%A4%9A%E0%A4%BE%3ADhanesh%20Shirodcar&formatversion=2&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser%7Ccontent

The "corrupt" boards with no workflow are artifacts from T131957/T154623. The web UI seems to treat them as empty boards and allow posting to them (edit: even posting using the web UI is very glitchy), at least until you try to move the page which I haven't tested.

Suggestion: Take the flowFixInconsistentBoards output, extract just the page titles, and then pass it to something like https://www.mediawiki.org/wiki/Manual:DeleteBatch.php. That should work with no coding.

This is like the fiftieth gremlin lurking in the Flow database I've encountered!

Per today's stand up...
Approach #1: modify script so that it continues running if/when it encounters an error
Start here โ†’Approach #2: modify script so that it deletes pages if they're empty
Approach #3: Manually delete pages, after reviewing some portion of them to ensure they're empty

After a day of experimentation I've finally realised what I should have understood from @Pppery's previous comment: these broken boards are by definition empty, so I don't need to worry about deleting user content.

To confirm my new understanding of what's going on:

  1. Flow creates one revision to represent a Flow board. This revision is a normal revision, except it has a contentmodel field which tells the web UI to render a Flow board and to pass its content to Flow
  2. The content of that revision is a JSON object which specifies a workflow ID
  3. A workflow ID is a pointer into the Flow database version of the page and not, as I'd assumed, a kind of template
  4. If you post into a broken Flow board, it gains a workflow ID and is therefore no longer broken

I should have thought through the implications of pppery's comment more closely: the UI treats them as empty boards. Either they are empty boards or they look like empty boards to the user, and therefore can safely be deleted.

I've now discovered this for myself by writing code to skip broken boards and then breaking a board on purpose to test my code.

So, onwards with Approach #4: doing what Pppery suggested in the first place, now I've reassured myself that I can safely feed 700 pages into the woodchipper without potentially deleting something I shouldn't.

That isn't technically true - the Flow database is supposed to store a redundant copy of what it thinks the page ID and name that go with the workflow are. These periodically get out of sync, which causes nasty bugs. The process that corrupted those pages probably didn't do that, which is why the move script is crashing - neither FlowFixInconsitentBoards (which uses the content stored in the core database) nor FlowMoveBoardsToSubpages (which tries to find the workflow via the page ID in the Flow database) can find it.

You're right, though, that, if a Flow board doesn't have a workflow in the core database, that means the UI will definitely appear empty, and even if you do create a topic it won't load properly, so I agree these can be safely deleted.

TLDR: You're right and can proceed, just adding more details about how Flow works.

I appreciate it! I like going relatively deep in understanding what's going on before I touch things so it's very helpful to get more detail.

I'll execute this on Monday, in the spirit of not doing deploys on friday night...

Mentioned in SAL (#wikimedia-operations) [2025-04-28T15:42:53Z] <zoe@deploy1003> manually-logged T389247 Beginning deletion of broken gomwiki flow boards

Mentioned in SAL (#wikimedia-operations) [2025-04-28T15:55:20Z] <zoe@deploy1003> manually-logged T389247 Completed deletion of broken gomwiki flow boards

Gosh that logging output is super unhelpful. I really should submit a patch to fix that.

Mentioned in SAL (#wikimedia-operations) [2025-04-28T16:03:34Z] <zoe@deploy1003> manually-logged T389247 attempting migration

Re-running didn't get far: https://phabricator.wikimedia.org/P75522 which I suppose I expected

I've modified the abuse filter so it should no longer trigger on delete attempts. So now you can probably re-run the deletion script (running it for pages that don't exist is safe) and then run the migration script, and then put this sorry task out of its misery.

That was quick โ€“ I'm afraid I picked a more annoying solution and posted to both pages with a brief explanation as to why!

That's interesting - I'm surprised that worked:

Note that the pages are still broken: https://gom.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BE%E0%A4%AA%E0%A4%B0%E0%A4%AA%E0%A5%80_%E0%A4%9A%E0%A4%B0%E0%A5%8D%E0%A4%9A%E0%A4%BE:Mat%C4%9Bj_Such%C3%A1nek/Archive_1 doesn't show the topic, probably because the core database doesn't know about the workflow. But now the Flow database does, so I guess the move worked.

The pages that were deleted were only a specific kind of empty board which was actually corrupt rather than empty caused by NewUserMessage and Flow not interacting properly in 2016. Other kinds of empty boards were moved as normal.

Matฤ›j Suchรกnek is one of the abuse filter cases above, which was moved rather than deleted because of that (and some other arcana).

Current list of empty boards:

1เคšเคฐเฅเคšเคพ:เค—เฅ‹เคตเคพ_เคฏเฅเคตเคพ_เคฎเคนเฅ‹เคคเฅเคธเคต/Archive_1
2เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:Shantakamat/Archive_1
3เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:Thibaut120094/Archive_1
4เคšเคฐเฅเคšเคพ:Mother_Teresa/Archive_1
5เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:Matฤ›j_Suchรกnek/Archive_1
6เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:XXN/Archive_1
7เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:STACEY_MESQUITA/Archive_1
8เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:CreativeC/Archive_1
9เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:95.152.31.181/Archive_1
10เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:178.127.180.212/Archive_1
11เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:178.122.255.238/Archive_1
12เคšเคฐเฅเคšเคพ:M._Boyer/Archive_1
13เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:80.95.45.101/Archive_1
14เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:93.124.95.219/Archive_1
15เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:95.152.6.96/Archive_1
16เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:46.185.6.74/Archive_1
17เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:37.113.15.155/Archive_1
18เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:80.95.44.255/Archive_1
19เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:37.113.10.155/Archive_1
20เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:109.194.241.255/Archive_1
21เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:216.244.78.34/Archive_1
22เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:80.95.44.9/Archive_1
23เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:54.38.94.238/Archive_1
24เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:80.95.45.233/Archive_1
25เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:93.124.94.219/Archive_1
26เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:95.152.50.151/Archive_1
27เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:93.124.126.225/Archive_1
28เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:93.124.83.53/Archive_1
29เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:193.56.117.7/Archive_1
30เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:95.152.48.31/Archive_1
31เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:93.124.94.150/Archive_1
32เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:176.114.153.43/Archive_1
33เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:176.114.153.23/Archive_1
34เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:176.114.153.4/Archive_1
35เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:93.124.39.2/Archive_1
36เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:176.114.153.116/Archive_1
37เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:Abuse_filter/Archive_1
38เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:Bbb23/Archive_1
39เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:185.254.52.221/Archive_1
40เคšเคฐเฅเคšเคพ:เคœเฅ‰เคฐเฅเคœ_เคเคฎ._เคฎเฅ‹เคฐเคพเคฏเคธ/Archive_1
41เคšเคฐเฅเคšเคพ:เคœเฅเคฒเคฟเคฏเคพเค‚เคต_เคฎเฅ‡เคจเฅ‡เคเฅ‡เคธ/Archive_1
42เคšเคฐเฅเคšเคพ:เคกเฅ‡เคฒเคพเคฏเคฒเคพ_เคฒเฅ‹เคฌเฅ‹/Archive_1
43เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:Rejoy2003/Archive_1
44เคšเคฐเฅเคšเคพ:เคœเฅ‰เคจ_เคกเคฟเคธเฅ‹เคเคพ/Archive_1
45เคšเคฐเฅเคšเคพ:เคถเฅเคฏเคพเคฎเคฐเคพเคต_เคฎเคกเค•เคฏเค•เคพเคฐ/Archive_1
46เคšเคฐเฅเคšเคพ:เค†เคจเคพเคคเฅ‰เคฒ_เคซเฅเคฐเคพเค‚เคธ/Archive_1
47เคšเคฐเฅเคšเคพ:Young_Chico/Archive_1
48เคšเคฐเฅเคšเคพ:Codar/Archive_1
49เคšเคฐเฅเคšเคพ:Cuncolim/Archive_1
50เคšเคฐเฅเคšเคพ:Velim/Archive_1
51เคšเคฐเฅเคšเคพ:เคชเฅเคฐเคฎเฅ‹เคฆ_เคธเคพเคตเค‚เคค/Archive_1
52เคšเคฐเฅเคšเคพ:เค•เฅ‰เคฎเฅ€เคกเคฟเคฏเคจ_เคธเฅ‡เคฒเฅเคตเฅ€/Archive_1
53เคšเคฐเฅเคšเคพ:เค—เฅ‹เค‚เคฏเคšเฅ‡เค‚_เคชเคฐเฅเคฏเคŸเคจ/Archive_1
54เคšเคฐเฅเคšเคพ:เค—เฅ‹เค‚เค‚เคฏเคšเฅเคฏเคพ_เคธเคพเคฏเคฌเคพเคšเฅ‡_เค†เคจเฅ€_เค—เฅ‹เค‚เคฏเคšเฅเคฏเคพ_เคชเคฏเคฒเฅเคฏเคพ_เคธเคพเค‚เคคเคพเคšเฅ‡_เคœเคฟเคตเฅ€เคค_เค†เคจเฅ€_เคตเคพเคตเคฐ/Archive_1
55เคšเคฐเฅเคšเคพ:เค•เคพเคถเฅ€เคจเคพเคฅ_เคถเฅ‡เคŸเค—เคพเค‚เคตเค•เคพเคฐ/Archive_1
56เคšเคฐเฅเคšเคพ:Frederick_Noronha/Archive_1
57เคšเคฐเฅเคšเคพ:เคชเคฐเฅเคธเคฟเคตเฅเคนเคฒ_เคจเฅ‹เคฐเฅ‹เคจเฅเคนเคพ/Archive_1
58เคšเคฐเฅเคšเคพ:Shashikant_Punaji/Archive_1
59เคšเคฐเฅเคšเคพ:เคเฅ‡เคตเคฟเคฐเคพเคšเฅ‹_เคจเคฟเคฐเฅ‹เคช/Archive_1
60เคšเคฐเฅเคšเคพ:เคชเฅ‹เคฐเฅเคคเฅเค—เฅ‡เคœ_เคญเคพเคฐเคค/Archive_1
61เคšเคฐเฅเคšเคพ:เคฒเคฟเค‚เค—เฅเคตเคพ_เคซเฅเคฐเคพเค‚เค•เคพ_เคจเฅ‹เคตเคพ/Archive_1
62เคšเคฐเฅเคšเคพ:เคชเฅเค‚เคกเคฒเฅ€เค•_เคจเคพเคฏเค•/Archive_1
63เคšเคฐเฅเคšเคพ:เค—เฅ‹เค‚เคฏ_เคตเคฟเคฆเฅเคฏเคพเคชเฅ€เค /Archive_1
64เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:Kandarpajit_Kallol/Archive_1
65เคšเคฐเฅเคšเคพ:Konknni_bhas_andolon/Archive_1
66เคšเคฐเฅเคšเคพ:Taiwan/Archive_1
67เคšเคฐเฅเคšเคพ:Portuguรชs
68เคตเคพเคชเคฐเคชเฅ€ เคšเคฐเฅเคšเคพ:Mickie-Mickie/Archive_1

That's 68 out of 141 boards. Probably easiest to process manually.

I did a check to make sure no topics that shouldn't have been deleted were caught in the delete batch script, and the only one affected was me testing something earlier, so fine.

@Pppery There seems to be some inconsistency in the number of flow boards on the wiki:

  • The task description says that '668 of the 833 Flow boards on gomwiki are completely empty', implying that there should be 833 - 668 = 165 non-empty Flow boards after deletion
  • Using the search function on gomwiki to find Flow boards produces a list of 741 Flow boards in existence
  • Finally, the comment T389247#10773886 indicates that 68 out of 141 boards are empty

The task description says that '668 of the 833 Flow boards on gomwiki are completely empty', implying that there should be 833 - 668 = 165 non-empty Flow boards after deletion

That data comes from before you processed and manually deleted a bunch of boards with only a header, which reduced the count of non-empty Flow boards (they're considered non-empty by that code)

Using the search function on gomwiki to find Flow boards produces a list of 741 Flow boards in existence

It doesn't really. For some reason the CirrusSearch index hasn't picked up all of the deletions in its index and hence the CirrusSearch count is wrong. CirrusSearch will filter out nonexistent rows that somehow got into the index before displaying them, so if you don't trust the displayed count and manually count the number of search results (I count 141), you get the right answer.

@Pppery I had handled 92 non-empty but non-significant pages, which should leave us with 165 - 92 = 73 non-empty pages. And 141 -68 = 73 as well - so everything tallies up. Thanks a lot for the explanation.

What does the error say? I can't really debug blindly.

@The_Discoverer Please post text as text (not as an image), so text can be copied and text can be found when searching for it. Thanks.

Could someone who has Logstash access please look up the stack trace for that production error?

Cannot find anything in Logstash for those reqIDs, maybe I'm trying the wrong dashboards... If those errors are still reproduceable, please file a separate issue.

Update

As of Wednesday, April 23, 2025, the Flow migration script has run at gom.wiki.

All that's left is the deletion of a relatively small number of empty boards which the Editing Team will assume @The_Discoverer and @Pppery will handle.

Of course, @The_Discoverer and @Pppery, if the above runs counter to what you'd been thinking, please comment as much!