Page MenuHomePhabricator

Pages corresponding to tables of contents' second links are not provided the first link as the "prev" parameter for headers generated with <pages>
Open, In Progress, Needs TriagePublic

Description

"Module:Header template" on a given Wikisource supplies a number of parameters automatically based on an index page if they are not otherwise supplied in the <pages header=1/> tag, including "prev", "next", "current", and so on. If the page using that tag corresponds to the second link in the corresponding index's table of contents, however, then the first link is not supplied as the "prev" parameter. This results in situations such as that in this page, which according to the book's table of contents should be noted in the header as following this page but currently isn't.

I suspect that changing ">" to ">=" on line 300 of "includes/Parser/PagesTagParser.php" would address this issue.

Event Timeline

@Mahir256 The first mainspace link on the Index: is skipped because it is assumed to be the title of the work. Unless I am misunderstanding your issue, putting a wikilink in the উইকিউপাত্ত আইটেম field should fix it.

putting a wikilink in the উইকিউপাত্ত আইটেম field should fix it.

Yes, I'm aware that throwing in an extra link is one way to sidestep the issue, but I wonder if the underlying assumption is worth hardcoding in the extension at all.

Yes, I'm aware that throwing in an extra link is one way to sidestep the issue, but I wonder if the underlying assumption is worth hardcoding in the extension at all.

This behaviour is as designed and documented, so it's not a matter of "sidestepping the issue". I'm sure that there are other ways it could be implemented too, but that would require a better definition of what the actual problem with the current behaviour is, and some sketch of a hypothetical better solution.

that would require a better definition of what the actual problem with the current behaviour is

As part of an effort to increase integration of the Bengali Wikisource with Wikidata, we have begun migrating most information within index pages that can be migrated to Wikidata items. As a result, the only free-text field in ProofreadPage that is frequently non-empty before the pagelist is the "wikidata-item" field, holding the Qid of the work's item. The "wikidata-item" field is used directly within our implementations of "Module:Header template" and "MediaWiki:Proofreadpage index template" in retrieving the migrated information for display in the appropriate places. Turning that field into a link makes processing it, even if it's merely a matter of stripping the link syntax, just a bit more complicated than it needs to be. If the information above the pagelist field is already on the Wikidata item, then adding a link to any of the fields above the pagelist becomes redundant.

and some sketch of a hypothetical better solution.

The simple fix I propose is a direct solution to the problem and to the extent I am aware of would not adversely affect most other generated headers. (I'd be interested in seeing where an adverse effect might occur, if at all.)

Turning that field into a link makes processing it, even if it's merely a matter of stripping the link syntax, just a bit more complicated than it needs to be.

Is there any particular reason why you can't add a link to the mainspace page where the work is transcluded to one of the other index fields and use a dedicated field for the QID?

to the extent I am aware of would not adversely affect most other generated headers.

It would break every single other use of <pages … header=1 /> because they rely on the documented behaviour.

Is there any particular reason why you can't add a link to the mainspace page where the work is transcluded to one of the other index fields and use a dedicated field for the QID?

We are using the "wikidata-item" field as the dedicated field for the Qid. As (implied, maybe should have been expressed better?) in the sentence following that to which you replied, a link to where the work is transcluded would have been the use for the "title" field before, but that link has been migrated to a sitelink on a Wikidata item, which can be retrieved using Lua's mw.wikibase functionality, thus making it redundant to retain it in the index page.

It would break every single other use of <pages … header=1 /> because they rely on the documented behaviour.

I'm not sure if you're trying to imply that every single other use of <pages header=1/> appears on a page corresponding to the second link in the index page and thus actually exhibits that behavior. Even if you aren't, however, the change I propose would realistically affect at most (# of index pages) pages--each of which would have exactly one "second link"--rather than the more enormous and troublesome (# of pages using <pages>).

The relevant piece of logic which this task considers was introduced in this commit, three years before Wikidata's launch and thus three years before the idea of pulling in information for an index page would have been conceived. We on the Bengali Wikisource, in the interest of modernizing the functionality of our site beyond the old days of direct {{header}} use and pre-Scribunto capabilities, are trying to embrace the benefits that Wikidata provides to the extent that it can support the metadata of index pages, and this necessarily extends from the slow discontinuation of certain index fields to the integration of their replacements in main namespace pages, in addition to the sometimes semi-automatic maintenance of some other namespaces. I can accept that such an attitude as we espouse may be foreign and uncomfortable to certain Wikisourcerors, but that doesn't mean that the software—which I feel the need to assert we wish to share with those Wikisourcerors—should continue to hold other communities back (or that the effort to defend what's being requested in this task is misplaced).

After some searches on s:en:, mw:, and Gerrit, the only acknowledgment of this behavior I was able to find is a comment from August 2019 in a Scriptorium by the objector to this task describing this behavior in response to someone else who was confused by it. Perhaps there is some non-formal record of this choice someplace which is expected to be visible but was not at the time I performed my searches (and perhaps I am not as good at searching for things as others), but absent this or an assessment that something particularly devastating might result from the addition of that equals sign, I contend that the particular one-time adjustment originating from that change is entirely manageable, at least compared to other software-related adjustments (such as the refreshment of Page:s across wikis to store proofreading status as a page property) which have had to be executed previously.

If @Tpt still finds this task and the proposal within it entirely objectionable, I will have no problem with his declining it. I would like there to be some consideration, however, of whether we ought to allow communities to dispense with the assumption that information aside from a Qid above a certain point must include some sort of link, no matter how unnecessary it might be, to conform to what they might consider parachronistic behavior.

We on the Bengali Wikisource … are trying to embrace the benefits that Wikidata provides to the extent that it can support the metadata of index pages

That's a laudable goal and one in which you are not alone (there are multiple efforts across the Wikisourcen to do the same, and there are non-Wikisource-specific technical efforts that will be important enabling technologies here that are currently being worked on). However, for one Wikisource to impose their idea of what the One True Way™ is for all the other Wikisources is not a particularly constructive approach. Dismissing as irrelevant the cleanup that other volunteers will have to do across all other projects because you want perfect technical purity in your local use case is extremely unlikely to be well received.

I would strongly recommend that you rethink this approach.

Noting that no formal documentation of the behavior of concern in this task was yet provided by this task's objector, despite the assertion made that it exists:

for one Wikisource to impose their idea of what the One True Way™ is for all the other Wikisources is not a particularly constructive approach.

Nowhere was I trying to imply that the change should be applied on account of some unsubstantiated notion of superiority, without adequate preparation on the other Wikisources. If this is a particularly breaking change, let this be pursued with the caution required for it, rather than suggesting that things like this not be pursued at all.

Dismissing as irrelevant the cleanup that other volunteers will have to do across all other projects because you want perfect technical purity in your local use case is extremely unlikely to be well received.

Nowhere was I trying to imply that, even with the aforementioned caution, the change I propose is a trivial thing to address; all I am saying is that compared to at least one other massive change that was once made affecting all Wikisources, there are considerably fewer pages across Wikisources that are actually affected by it, and that, due to how this behavior manifests, that number is necessarily limited (a rather high upper bound) by the number of index pages on a given Wikisource (for each of which there may be at most one mainspace page with a <pages header> tag with no explicit "prev" parameter, this page being that referred to by the second link on said index page).

If my initial assessment of how many pages this affects was inadequate for you, here's a somewhat better assessment:

  • Considering the English Wikisource only, of its 23,033 index pages, only 5,420 have tables of contents in them. Among those, of the 1,578 index pages which don't transclude a page from the Page: namespace, 536 have no links at all. (The current upper bound of the number of pages affected is thus 4,884.)
  • (This comment will be updated as more analyses are run.)

I would strongly recommend that you rethink this approach.

With respect to this Phabricator task, I refer you to the first point under "Reason" in this text.

Noting that no formal documentation of the behavior of concern in this task was yet provided by this task's objector, despite the assertion made that it exists:

You're looking for this page.

@Bodhisattwa @jayantanth If the current PRP behaviour is holding back progress on Bengali Wikisource I suggest one of you pare it down to a neutral "Here's what we want to achieve, and this is what we're having trouble with" description in a new Phab task. If (big if) what I am extracting from the above is correct, I feel confident there are ways this can be achieved without disrupting other projects. For example, a config variable to enable or disable the behaviour per project, so Bengali Wikisource and any others that want to go entirely to Wikidata for their Index: pages can do so as and when they are ready to do so (there are other possible approaches too). As I mentioned up above somewhere, I know there are other Wikisources that are working on this so it is likely having this configurable will be of interest to more projects either short term or at some point in the foreseeable future.

Noting that no formal documentation of the behavior of concern in this task was yet provided by this task's objector, despite the assertion made that it exists:

You're looking for this page.

@Bodhisattwa @jayantanth If the current PRP behaviour is holding back progress on Bengali Wikisource I suggest one of you pare it down to a neutral "Here's what we want to achieve, and this is what we're having trouble with" description in a new Phab task. If (big if) what I am extracting from the above is correct, I feel confident there are ways this can be achieved without disrupting other projects. For example, a config variable to enable or disable the behaviour per project, so Bengali Wikisource and any others that want to go entirely to Wikidata for their Index: pages can do so as and when they are ready to do so (there are other possible approaches too). As I mentioned up above somewhere, I know there are other Wikisources that are working on this so it is likely having this configurable will be of interest to more projects either short term or at some point in the foreseeable future.

@Xover , I have filed a new ticket T285738 as per your suggestion and tried to keep it as neutral as possible stating only the problem we are facing. Hope it helps.

@Xover , I have filed a new ticket T285738 as per your suggestion and tried to keep it as neutral as possible stating only the problem we are facing. Hope it helps.

I may have misunderstood what you are trying to achieve. The new task describes trouble with the prev/next links, for which the response is "you need to put a mainspace link on the Index: title field". This is the documented behaviour and so the problem you're seeing is "user error" (the software is working as designed, so this is not an error or a bug in the software).

What I thought I had understood was that your actual overall goal is getting rid of locally stored data in the Index:, and especially the title link, and this by-design/documented behaviour was preventing you from achieving that goal. In other words, the software is working correctly, but the way it works is getting in the way of this other thing you want to achieve.

Iff I've understood correctly (and I could certainly have misunderstood), the formulation would then be something like "We want to move all Index: fields to Wikidata, but we can't currently do that because dropping the title field link breaks the next/prev links in automatic header generation. We are looking for some way to enable us to drop that link but keep automatic headers functioning."

If my understanding is accurate, that's an issue other Wikisources with similar efforts have already, or will soon, run into, so my guess is that you'd get plenty of support for it and any solution will be reusable for several Wikisources going forward.

Thank you @Mahir256 and @Xover for this discussion.

If I try to summarize it:

  1. It is useful to be able to have the first "prev" link working for Index: pages that does not contain a "title" value because of the Wikidata migration.
  2. There are thousands of works which rely on the old behavior of the first link to ns0 being ignored when generating the prev/next links (at least in the French and English Wikisources).

I see a way that might allow us to get out of this problem: Instead of ignoring the first link to ns0, why not just make ProofreadPage only consider links in the "table of contents" field of the Index page, without skipping the first one?
This might be implemented by annotating the relevant field in "Mediawiki:Proofreadpage_index_data_config" with something like "data": "toc". If such a field is present in the configuration this field would be the only one considered for prev/next links. If this field is not present the current behavior would be kept, at least during the time required to migrate all Wikisources. This way would not break the "ignore the title" feature while allowing to fix the "prev" links for wikis that makes use of Wikidata.

@Mahir256 and @Xover what do you think about this idea? If there a use case I missed?

what do you think about this idea? If there a use case I missed?

I can't speak to technical feasibility of the approach or whether this would be adequate to let Bengali Wikisource do what they are aiming for, but apart from that this sounds like an eminently workable solution. If feasible as described this approach would "just work" for all the per-project variants I can currently think of.

In particular, this sounds like it would provide a very easy and smooth migration path at the per-project level.

But aiui the granularity would be per-project and not per-index. How would a project start experimenting with this or avoid the need for a big bang migration? Use two toc fields in the index, one for the traditional way and one for a "Wikidata-based toc"? That would necessitate PRP differentiate between "present" and "non-empty" for the index data field tagged with "data": "toc", I think. That sounds a little more complicated, but is perhaps not too much so?

But aiui the granularity would be per-project and not per-index. How would a project start experimenting with this or avoid the need for a big bang migration? Use two toc fields in the index, one for the traditional way and one for a "Wikidata-based toc"? That would necessitate PRP differentiate between "present" and "non-empty" for the index data field tagged with "data": "toc", I think. That sounds a little more complicated, but is perhaps not too much so?

If a Wikisource want to do that, it would just require to create a new field in "Mediawiki:Proofreadpage_index_data_config" and annotate it with "data": "toc". This way only index pages which would have "opt-in" by moving the toc to the new field would use the new approach. Anyway, I have a hard time figuring out what would break if a Wikisource just add "data": "toc" to the current toc field configuration. The "title" link is I believe nearly always in a "title" field and not in the TOC so it would be still ignored with the new approach.

Change 704564 had a related patch set uploaded (by Tpt; author: Tpt):

[mediawiki/extensions/ProofreadPage@master] Build prev/next link only from the TOC

https://gerrit.wikimedia.org/r/704564

Tpt claimed this task.

Yes. To make it work properly "data": "toc" should be added to the relevant field in Mediawiki:Proofreadpage_index_data_config.

Maybe the issue didn't go actually.

If we put the main page of the book at the beginning of the TOC like this TOC, the 2nd page becomes the 3rd page in TOC and thus it looked like that the issue was resolved.

For TOCs where the main page of the book is still not added at the beginning, the issue is still there. Eventually, we will add all main pages of the book in all TOCs, but maybe not every Wikisource will not follow that pattern, so the issue still needs to get resolved.

Sorry, I should not have closed this issue. The change that should fix this problem is not merged yet: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ProofreadPage/+/704564
So, it makes sense that the problem does not seem solved on Wikisource.

Sorry, I should not have closed this issue. The change that should fix this problem is not merged yet: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ProofreadPage/+/704564
So, it makes sense that the problem does not seem solved on Wikisource.

@Tpt, I guess, it is still not merged.

Change 704564 merged by jenkins-bot:

[mediawiki/extensions/ProofreadPage@master] Build prev/next link only from the TOC

https://gerrit.wikimedia.org/r/704564

The change fixing this issue is now merged.

Now that we are on wmf.9, I see that for this index, with "data": "toc" set for the table of contents ("Remarks"), and without using the trick mentioned in T285610#7267935 on that index, the second chapter still does not link to the first chapter or vice versa. Is there something we need to refresh on our end for the effects of this patch to be visible?

@Mahir256 The change I wrote is in fact buggy. It's looking for a field named "toc" and not a field which data field is "toc". I'm writing a fix right now. Sorry for that.

Change 741682 had a related patch set uploaded (by Tpt; author: Tpt):

[mediawiki/extensions/ProofreadPage@master] Fixes lookup of field per key

https://gerrit.wikimedia.org/r/741682

Change 741682 merged by jenkins-bot:

[mediawiki/extensions/ProofreadPage@master] Fixes lookup of field per key

https://gerrit.wikimedia.org/r/741682

Hmm. I may be insufficiently caffeinated just now, but…

On this work, the work's main page in mainspace is showing an automated header with just a prev link to the last chapter in the toc and no next link to the first chapter. The behaviour used to be the exact opposite. On the subpages for each chapter the automatic links appear to be as expected; including the last chapter having only a prev link back to the penultimate chapter and no next link.

I've (superficially) checked that the behaviour occurs across ~all works on noWS (it's the standard way to transclude there), and works using automatic header for the work's main page on enWS (most don't).

Skimming the source I find…

The test on line 293 of PagesTagParser.php is never going to succeed because the top level page (what's in the index's title field) is no longer in the $indexLinks array. Which in turn means that the test on line 316 is always going to fail and the |current=$current template param will never get passed.

The test on line 299 is always going to succeed, except on completely empty toc fields (which is an error not an edge case), which means a prev link is always going to be output even when there is no actual previous page.

The prev/next tests are outside the for loop, but are testing the loop index variable (which is a no good, very bad practice). $i is thus here always going to be $i == $indexLinksCount.

Which means that the test on line 303 is always going to fail (because the test on line 293 did, see above, so the break is never triggered and the for loop iterated over the entire array). The if here is actually testing whether $indexLinksCount + 1 < $indexLinksCount, or whether 1 + 1 < 1.

So… the quick/dumb fix is probably something like:

old
if ( $i > 0 ) {
  $prev = '[[' . $indexLinks[$i - 1]->getTarget()->getFullText() . '|' .
    $indexLinks[$i - 1]->getLabel() . ']]';
}
if ( $i + 1 < $indexLinksCount ) {
  $next = '[[' . $indexLinks[$i + 1]->getTarget()->getFullText() . '|' .
    $indexLinks[$i + 1]->getLabel() . ']]';
}
new
// Only pages in the list of toc links and after the first get a prev link.
if ($i > 0 && isset($current)) {
  $prev = '[[' . $indexLinks[$i - 1]->getTarget()->getFullText() . '|' .
    $indexLinks[$i - 1]->getLabel() . ']]';
}

// If the current page is one of the links in the list of toc links, and is
// not the last link in the list, we add a link to the next page. If we are on
// a page that's not in the list of toc links, for example the top-level
// mainspace page containing the front matter, the next link is always to the
// first page in the list of toc links. The last page in the list of toc links
// never gets a next link.
if (isset($current) && $i + 1 <= $indexLinksCount) {
  $next = '[[' . $indexLinks[$i + 1]->getTarget()->getFullText() . '|' .
    $indexLinks[$i + 1]->getLabel() . ']]';
} elseif (!isset($current)) {
  $next = '[[' . $indexLinks[0]->getTarget()->getFullText() . '|' .
    $indexLinks[0]->getLabel() . ']]';
}

This is the brute-force approach, and there may be use cases I'm not considering, so someone that's actually familiar with the code should take a closer look at the best approach to fixing it. For example, having a flag that's raised (set) by the test where we break out of the for loop would possibly lead to cleaner logic here. As could possibly returning something other than a flat list from getTableOfContentLinks(), and coding explicitly for a distinct title link if one is present (instead of making assumptions about order that the toc changes now invalidated).

Now to go find some coffee…

Change 742968 had a related patch set uploaded (by Tpt; author: Tpt):

[mediawiki/extensions/ProofreadPage@master] pages tag: Do not set previous and next links if there is no current link

https://gerrit.wikimedia.org/r/742968

@Xover Thank you so much for having found this issue. It was under my nose all this time.

I am a bit scared of linking to the first page if the current page is not in the toc. For example, the French Wikisource is used to not display the navigation links in the ns0 main page of the book. Just not having default values for prev and next if the current page is not in the TOC seems the simplest and safest behavior to me.

I just wrote a PR about it: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ProofreadPage/+/742968/
Does it looks good to you?

@Tpt As best I can tell, the automatic next link on the main page was the existing behaviour. When the work's title was in $indexLinks[0], the if test inside the for loop would get a hit and set $current. The prev link test then explicitly excluded it by skipping index 0, but the next test just checked that it wasn't the last entry in the array ($i + 1 < $indexLinksCount) and so would add a next link on the page corresponding to the work's title.

My proposed code above extended that to all pages that are not in the toc, which may indeed be a bad idea: it would mean that, e.g. a poem transcluded onto [[The Raven]] in addition to within [[The Collected Works of Edgar Allan Poe]], would always get a next link pointing somewhere down inside the containing work (e.g. [[The Collected Works of Edgar Allan Poe/Vol. 4/Part 5/Chapter 3]]). Then again, you probably wouldn't use <pages … header=1 /> on [[The Raven]] so I doubt it'd be a problem in practice.

But since the main page is no longer in $indexLinks[0] there is no convenient way to test whether we are on the work's main page and emit a next link only on that page. Hence the vague suggestion to return something other than a flat list from getTableOfContentLinks(): it would get us something to test against to more surgically emit the link rather than just blindly spit it out on all pages that are not in the toc.

I am a bit scared of linking to the first page if the current page is not in the toc.

I must agree with Xover's comment regarding the provision of a next link on a main page. While attempting to provide such a link using JavaScript and the mw.Api may be possible, I do not think that it is necessarily the most efficient use of resources when loading a Wikisource page, given that each time a page is loaded the table of contents portion of an index needs to be rendered again (meaning after ProofreadPage does so once) to determine which link to add. (The alternative would be to add the next link manually to the page, which does not fully comport with the underlying motivation behind this task.)

Change 742968 merged by jenkins-bot:

[mediawiki/extensions/ProofreadPage@master] pages tag: Do not set previous and next links if there is no current link

https://gerrit.wikimedia.org/r/742968