Page MenuHomePhabricator

Algorithm to detect when the content above the first section needs to be archived and/or converted into Flow header
Closed, DuplicatePublic

Description says:

takes the content of the page above the first section and adds it to the Flow board header.

In an ideal world, this should be enough, but Wikimedia world is far from ideal... There are plenty of discussion pages where a first poster just left a comment without any header (also maybe without signature, although bots and editors tend to clean up the lack of signature, less so the lack of header). While this might be a small annoyance when converting single pages or exotic namespaces, it is a candidate to become a blocker in larger moves.

How to fine tune the algorithm? Some ideas:

  • In any case, copy the entire content of the discussion page in the archived wikitext page. This means that some headers might be duplicated, which is fine. The wrong headers in Flow pages can be safely removed, knowing that there is a copy in the archived page. This is significantly less effort than deleting from the Flow page and adding to the archived page.
  • If there are no templates other than "ping", "u", and similar ones to mention users and there are signatures, then don't use it as header, just archive it.
  • If there are no templates nor signatures, don't use it as header, just archive it.
  • If the content above the first section starts with a template, then keep it in the header. There are high chances that it's a banner about the origin or quality of the article.