Page MenuHomePhabricator

Placement of pagenum span on page-spanning tables
Open, Needs TriagePublic

Description

On English Wikisource, we have many tables of contents that span multiple pages. However, it's a bit tricky to construct it so that the pagenum spans (from MediaWiki:Proofreadpage pagenum template) are inserted at the right position.

Imagine two Page-NS pages:

Page 1:

{|
| 1 | Chapter 1

Page 2:

|-
| 2 | Chapter 2
|}

In this case, the pagenum spans are inserted after "Chapter 1", which is technically the wrong row, but it's a valid place for content to be (it's inside a td element)

Also, when the table row is templated, this means that whitespace between rows is "sucked" into the end of the "Chapter 1" cell, and since |- has to start on a new line, it's easy to get extra row into the cell contents, even with a single-line gap in the wikitext:

{|
{{foorow|1|Chapter 1}}

{{foorow|2|Chapter 2}}

produces:

{|
| 1 | Chapter 1


|-
| 2 | Chapter 2

The other way around, with |- on the first page:

Page 1:

{|
| 1 | Chapter 1
|-

Page 2:

| 2 | Chapter 2
|}

In this case, the pagenum spans are inserted in the "dead" space between rows and therefore get "fostered". This doesn't set off the linter (I assume due to when the PP extension runs?). This means that all the table pagenum spans end up at the top of the table.

On the other hand, extra whitespace between row templates is suppressed.

So, the question is: Can the ProofreadPage extension be smarter about where the pagenum template is inserted to avoid putting it in "dead" table space? Or is this something that contributors are expected to need to be aware of, and just be very careful not to inject whitespace between rows?

Background reading from when I looked into this at enWS last year: https://en.wikisource.org/wiki/Wikisource:Scriptorium/Help/Archives/2018#.7B.7BTOC_begin.7D.7D_and_family..

Event Timeline

Just an idle thought…

If fully-automatic handling of this is infeasible in the short term, would a manual-but-at-least-deterministic approach be possible based on ProofreadPage providing a magic word for placing the span if it is present? I'm thinking along the lines of the TOC magic word in enwp mainspace: if it is present in the wikitext the automatically generated table of contents is placed in that position in the page, and if it is not present the position is automatically determined. With a PAGEMARK (or whatever) magic word the contents of MediaWiki:Proofreadpage pagenum template is inserted in that position, otherwise it behaves as previously.

This would afford manual control to editors in those edge cases where the default placement causes problems.

Isn't the number placement, a function of the localized page numbering script (as in not part of the extension)?

@ShakespeareFan00, no. The contents of MediaWiki:Proofreadpage pagenum template are inserted at the start of each transcluded Page. In the case of enWS, this is a span with a class of pagenum and some other bits and bobs. The local JS (on enWS, MediaWiki:PageNumbers.js) then uses those spans to insert the visible page numbers in the left margin.

The issue is that just inserting at the start means it's possible to put the span in an invalid location, in this case, between one </tr> and the next <tr>. Then it gets fostered to the table start, and the (local) JS thinks all the pages are in the same place, at the table start. In the case of the enWS JS, at least, if multiple page numbers are in the same place, only one is shown.

Hmm. Perhaps this issue could also be solved by remex being intelligent about where this particular span belongs? At least inside a <tr>, or outside a <tr> but inside a <tbody>, the correct placement of it is inside the immediately preceding <td>. If remex knows the difference between ordinary page content and the stuff inside the ProofreadPage-specified interface file it should be feasible to do this at that layer with the existing infrastructure there.

There's stuff you could put inside MediaWiki:Proofreadpage pagenum template that would break inside a <td>, but those aren't things you'd ever want to put in the template; and so it would constitute either malicious intent or an edge case of an edge case.

@Xover exactly. Though I think it makes more sense for it to be pushed forward into the next <td>, otherwise it will be on a row that came from the previous page.

@Tpt (or anyone), is it possible to get a comment on what we should do about this? Even if the comment is "fix it at enWS"?

Thousands of TOCs have broken page numbers and the bot job to fix them is quite a lot of messing around, so it will be good to know if it's a good idea. Plus, if the fix as detailed above is made, users will need to be very careful about whitespace going forward.

So, the question is: Can the ProofreadPage extension be smarter about where the pagenum template is inserted to avoid putting it in "dead" table space?

Yes, I definitely think it might be possible for ProofreadPage to be smarter in the future. The new Parsoid parser APIs allow extensions to tweak both the wikitext before parsing and the HTML after parsing. But it would definitely require some though and development time to implement it properly. I believe that fixing the enwikisource pages for now is the best way to go before, hopefully, a more general fix in an undeterminate future when Parsoid will fully replace the existing MediaWiki parser.

Thank you for the reply, that makes sense.

I will start the process of fixing it at enWS by swapping the row templates to have the |- at the start of the template, not the end.