Page MenuHomePhabricator

Sometimes the anchor links generated by an Index: page's <pagelist /> do not have the expected class attribute and value set
Closed, ResolvedPublic

Description

Sometimes the anchor links generated by the Index: page's <pagelist /> for existing Page: namespace pages do not have the expected class attribute and value (class='qualityN prp-pagequality-N') set. In these cases, IndexPage in proofread.py fails, as it relies on this info.

A null edit of the Index page refreshes the status, and the class reappears.

Note: The anchor links generated by an Index: page's <pagelist /> for non-existing Page: namespace pages (pages yet to be created) do seem to all have the expected class attribute and value as any other redlink anchor link would (class='new').

Would be good to:

  • understand why this happens (so countermeasures might be taken, at least in pywikibot)
  • make IndexPage in pywikibot more robust or give errors when such situation appears (and possibly skip tests)

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Mpaa updated the task description. (Show Details)
Mpaa added subscribers: Tpt, jayvdb, Billinghurst.

Made a test on one of such pages:

>>> python scripts/touch.py -page:Index:Popular_Science_Monthly_Volume_18.djvu

brings the page back to a normal status.
So same action as in touch.py could be done if such attribute is not found in <a ... href= .... title= ... /> tags in the IndexPage html during parsing.

Did you try a purge also? Noting that this is what is needed to resolve the issue for special:indexpages issue. The (interim) value of a persistent purge of Index: ns pages based on recent activity in Index/Page: nss seems to growing.

Billinghurst renamed this task from Index page sometimes does not use to Index page sometimes does not use required class set.Sep 30 2015, 11:14 PM
Billinghurst renamed this task from Index page sometimes does not use required class set to Index page sometimes does not have required class set.

Noting that I plan to purge all pages in Index ns overthe weekend. I had started however I forgot to nohup.

Also purging in en.wikisource or

python scripts/touch.py -page:Index:Popular_Science_Monthly_Volume_25.djvu -purge

works.

Change 243032 had a related patch set uploaded (by Mpaa):
proofreadpage.py: purge IndexPage when Index has no required class set

https://gerrit.wikimedia.org/r/243032

Change 243032 merged by jenkins-bot:
proofreadpage.py: purge IndexPage when Index has no required class set

https://gerrit.wikimedia.org/r/243032

From pywikibot perspective this could be closed.
Leaving this open for a while if someone wants to comment why such "no required class set" status is reached.

I'm not sure I understand the issue - mostly because the only place I can find the mentioned class='quality N prp-pagequality-N' string in use is in the anchor links generated by the <pagelist /> assignments - as in the following example

<a title="Page:Foo.djvu/5" class="quality4 prp-pagequality-4" href="/wiki/Page:Foo.djvu/5">Title</a>

... rather than anywhere Index: page specific/related.

@Mpaa Are those PageList generated anchor links the one(s) with the 'no required class set' problem? A living example would be helpful too.

GOIII triaged this task as High priority.Dec 2 2015, 1:01 AM

I'm not sure I understand the issue - mostly because the only place I can find the mentioned class='quality N prp-pagequality-N' string in use is in the anchor links generated by the <pagelist /> assignments - as in the following example

<a title="Page:Foo.djvu/5" class="quality4 prp-pagequality-4" href="/wiki/Page:Foo.djvu/5">Title</a>

... rather than anywhere Index: page specific/related.

@Mpaa Are those PageList generated anchor links the one(s) with the 'no required class set' problem? A living example would be helpful too.

Yes.
I would not know how to find one example. Moreover Billinghurst has purged Index pages, so the only option to find one is that the problem reoccurred after the purge action.

GOIII renamed this task from Index page sometimes does not have required class set to Sometimes the anchor links generated by an Index: page's <pagelist /> do not have the expected class attribute and value set.Dec 5 2015, 12:27 AM
GOIII updated the task description. (Show Details)

...
@Mpaa Are those PageList generated anchor links the one(s) with the 'no required class set' problem?

Yes.

Thanks for clearing that up - I took the liberty of amending this Task's title and description accordingly. I also noted that it is normal for pages that have not been created in the Page: namespace yet to have settings just like any other redlink would, class="new".

A living example would be helpful too.

I would not know how to find one example. Moreover Billinghurst has purged Index pages, so the only option to find one is that the problem reoccurred after the purge action.

My remaining 'stash' of long untouched and well forgotten Index: &/or File: pages possibly reflecting this and other deficiencies all went with that purge it seems.

I suggest we "pause" here until we can isolate a current example exhibiting your 'class dropout' problem. Hopefully, nobody will succumb to the urge of 'fixing' it before we can re-investigate the matter.

@Ankry , as of today, I see the first 5 pages and the last 4 pages of the source file have been created in total on the Index: you linked just above.

All 9 of those pages do have the proper class string (class='qualityN prp-pagequality-N') set when I checked.

The remaining pages linked but not yet created have a [red link] status and class string set to new -- as I believe they should.

Again, isn't that the way it's always been? Maybe somebody came along between then & now and 'refreshed' the page so that is why I'm no longer seeing what you saw originally?

@GOIII The buggy page lived few hours only. Untill somebody else re-editted it to "fix" the bug.

It seems to be impossible to "catch" an example, on a wiki with multiple active users, unless immediately. :(

@Ankry I pretty much figured somebody "fixed it" given the amount of time that had passed.

I did go through some of my old notes mentioning the <pagelist /> tag and came away with some ideas on what to make note of the next time somebody comes across an example of this bug.

  • Is there any difference in the way self-closing element tags are "handled"?

In other words; right now, the Index: page "template" places the element in the appropriate field like this <pagelist/> (no space before "/" (U+002F) character). Maybe the proper 'default generated' pagelist element should contain a space after the element's tag name like so: <pagelist /> ?

  • Along the same line of questioning; is there any improvement/difference between "forcing" 1-to-1 position assignments rather than substitute it when no manual changes have been made?

In other words; would it be better to have the page field of the Index: template pre-populated by default with <pagelist 1=1 /> instead of the current default <pagelist/> producing the same 1-to-1 assignment in the absence of any User: made changes?

I'm not sure either point relates to the problem at hand but I think they are worth making note of in eliminating possible causes if nothing else. If anybody can think of any other aspects worth investigating here - please add them. Of course, if someone is lucky enough to check an/or contrast these points in a working example - please report your findings back here. TIA.

I seem to get a similar issue but I was able to reproduce it consistently:

https://en.wikisource.org/wiki/Index:Popular_Science_Monthly_Volume_1.djvu
https://de.wikisource.org/wiki/Index:Musen-Almanach_f%C3%BCr_das_Jahr_1799

We use this for the pywikibot test T128986. The tests are failing consistently too :)

What I observer:

SiteLogged InNot Logged In
en.wikisourcecolored links showncolored links not shown
de.wikisourcecolored links showncolored links shown

Is this expected behavior or is this a side effect of this bug ?

This may just be a "lookalike" case but posting in case it may be of use:

Any arbitrary Index:blahblah.djvu may be induced into the state of "colored links not shown" at will by performing a so-called "Hard Purge" (i.e. POSTing a request to

/w/api.php?action=purge&titles=Index:blahblah.djvu&forcerecursivelinkupdate=1&redirects=1

and then kicked back to the normal ("colored links shown") state again by performing a normal (GET request to

/wiki/Index:blahblah.djvu?action=purge

but only whilst the user is logged out! Neither of the above actions has any effect (that I have observed) upon the logged-in state.

Both enWS and test2 environments behave this way.

Change 301924 had a related patch set uploaded (by Tpt):
Adds Page: pages outputted by <pagequality> tag as dependencies of the current page

https://gerrit.wikimedia.org/r/301924

Change 301924 merged by jenkins-bot:
Adds Page: pages outputted by <pagequality> tag as dependencies of the current page

https://gerrit.wikimedia.org/r/301924

This should work fine now. Could someone confirm?

This should work fine now. Could someone confirm?

The same problem happened to me just now on https://vec.wikisource.org/wiki/Indice:Poesie_e_satire_di_Pietro_Buratti_veneziano.pdf, so it doesn't seem to be solved.

we are still facing the proble at mr wikisource, at times it loads correctly with proper color coding ant times it does not. yesterday it showed correctly and just now again shows incorrectly as show in this picture.

This example has been taken from from from https://mr.wikisource.org/wiki/अनुक्रमणिका:Arth_shastrachi_multatve_cropped.pdf an Index page for book from mr wikisource. an Index page for book from mr wikisource.

Wikisource Index Page Status colour coding not visible problem.png (336×764 px, 48 KB)

This is really annoying. In addition the purge button quite often doesn't appear.

A possible idea of root cause: the Index: page is maybe rendered at the same time as the category table from Page: pages (re)rendering. In this configuration the page status is maybe not available in the database.

Dvorapa subscribed.

This is now failing on Pywikibot's AppVeyor and Travis

@Mpaa @Dvorapa @Dalba Sorry for distrubing you. Is this problem fixed now?

Inductiveload claimed this task.

I haven't seen this for a very long time. I think we can close this, without prejudice to re-opening if this is still a problem somewhere.

Xqt subscribed.

The problem isn't fixed, see T181913. It still fails from time to time.

Change 969516 had a related patch set uploaded (by Mpaa; author: Mpaa):

[pywikibot/core@master] proofreadpage.py: fetch URL of page scan via API

https://gerrit.wikimedia.org/r/969516

Xqt reassigned this task from Inductiveload to Mpaa.

Change 969516 merged by jenkins-bot:

[pywikibot/core@master] proofreadpage.py: fetch URL of page scan via API

https://gerrit.wikimedia.org/r/969516