Page MenuHomePhabricator

Commons "File usage on other wikis" does not show usage by German Wikisource
Closed, ResolvedPublic

Description

Usually the usage of a Commons file by Wikisource is displayed in the section "File usage on other wikis" of the file description, e. g. for https://commons.wikimedia.org/wiki/File:LA2-NSRW-3-0368.jpg which is used on English Wikisource.

However, for files that are used on German Wikisource this seems not to work, e. g. for https://commons.wikimedia.org/wiki/File:Die_Gartenlaube_%281887%29_894.jpg which is used by https://de.wikisource.org/wiki/Seite:Die_Gartenlaube_%281887%29_894.jpg.

This problem exists for many files, maybe for all files used by German Wikisource.

Originally brought up in https://commons.wikimedia.org/w/index.php?title=Commons:Village_pump&oldid=168443058#File_usage_on_other_wikis_does_not_work_for_dewikisource

Event Timeline

Aschroet created this task.Aug 12 2015, 8:55 AM
Aschroet raised the priority of this task from to Needs Triage.
Aschroet updated the task description. (Show Details)
Aschroet added a project: Commons.
Aschroet added a subscriber: Aschroet.
Restricted Application added subscribers: Steinsplitter, Aklapper. · View Herald TranscriptAug 12 2015, 8:55 AM
El_Grafo set Security to None.

It will presumably due to the way that Proofread Page pulls and references the images when you pull it through a constructed Index: page. Have you got some djvu files in play that can be used as a point of comparison?

Seems that generally the "File usage" is rather outdated. Even in English WS there are missing links, e. g. for https://en.wikisource.org/wiki/File:Carroll_-_Phantasmagoria_and_other_poems_%281869%29.djvu. It should contain all pages from 1 to 220 according to the index: https://en.wikisource.org/wiki/Index:Carroll_-_Phantasmagoria_and_other_poems_%281869%29.djvu.

So maybe there is a general updating problem. I would assume that direct (Inter-)Wikilinks to a file appear immediatly, but ProofReadPage inclusion is not updated regularily.

You have solved it, I just null edited the page and the image now shows. So, it might be worth getting a bot to run through your pages in the Seite: ns and (re)save them.

@Aschroet I am told that touch.py [https://www.mediawiki.org/wiki/Manual:Pywikibot/touch.py] should be able to do it. I will see if I can get that functioning for wikisourcebot at enWS, and if it is a help we can set it free on deWS. As it would only touch (null edit) files, it shouldn't have any issues with actually editing.

@Billinghurst, the question for me is: why there is no mechanism to update these information on a regulary basis or even better based on a push mechanism. Without it this needs to be updated again after a while.

On toollabs You will want ...

python /shared/pywikipedia/core/scripts/touch.py -lang:de -family:wikisource -namespace:102 -start:\! &

It shouldn't have failed, and I don't know why it failed, and cannot resolve that part. I can assist with the remedy. <shrug>

@Billinghurst, i will perform this command. Then i could try to observe if the recent changes of pages are reflected in the related GlobalUsage at Commons. Maybe i find something interesting there.

Furthermore, maybe someone knows someone who knows about that updating mechanism and why it is not working properly.

Tpt added a subscriber: Tpt.Aug 12 2015, 3:14 PM

Here is an explanation of the current state:

I have updated the ProofreadPage extension to add the Page: pages scan images to the files usage lists. It was not done before. See https://gerrit.wikimedia.org/r/#/c/222858/ . But the update of the list of files used by the page is done when the page is purged because image dependences are computed from the expended Wikitext of the page.

So, it's why pages that have not been purged after the deployment of this change are not reflected in the usage lists.

Thanks for the explanation, @Tpt.

Did i got it right that, (1) all newly created ProofreadPages should be fine and (2) the old ones all need to be touched/purged to have an updated "File usage"?

Ok, this means that we have two possibilities for solving this ticket:

(1) We do nothing. So future edits of the pages stepwise updates the related "File usages". This process will be completed after a rather long time due to the low edit rates of pages in WS, especially when they are in validated state.

(2) We need an approach to touch all pages in all Wikisources. This is the clean solution but compared to (1) this causes more work. Anyone has an idea about the effort?

My comment is that Wikisource-bot was setup by me, and already shared with others, to be available to the community to do Wikisource-related work. We use it where it can be of value. I am no developer, but I can start commands, and here we would be talking in the order of less than one hundred.

Tpt added a comment.Aug 13 2015, 2:03 AM

@Billinghurst If you could run your bot on all Wikisources in order to purge all Page: pages, it would be amazing and will solve the issue.

Aklapper updated the task description. (Show Details)Aug 13 2015, 8:34 AM

I am taking this out of the code fixing space, and just making it a wikisource bot task

I am running

jsub python /shared/pywikipedia/core/scripts/touch.py -lang:xx -family:wikisource -namespace:yyy -start:\! -purge -daemonize:xxWS_yyypurge -pt:0

enWS, ns:104
frWS ns:104
itWS ns:108
mulWS ns:104
arWS ns:104
asWS ns:104
beWS ns:104
bnWS ns:104
brWS ns:102
caWS ns:102
daWS ns:104
elWS ns:100
eoWS ns:104
esWS ns:102
etWS ns:102
faWS ns:104
guWS ns:104
heWS ns:104
hrWS ns:102
huWS ns:104
hyWS ns:104
idWS ns:104
knWS ns:104
laWS ns:104
mlWS ns:106
mrWS ns:104
mlWS ns:104
noWS ns:104
plWS ns:100
ptWS ns:106
roWS ns:104
ruWS ns:104
saWS ns:104
slWS ns:100
svWS ns:104
teWS ns:104
vecWS ns:102
viWS ns:104
zhWS ns:104

That is all those configured with non-std namespace for Page, now I will need to hunt up a separate list

@Billinghurst, i am not sure if purge is enough for updating the "File usages" at Commons. For me it did not work yesterday. Could you please verifiy if it does or not? If not you could then just skip that option and have a null edit instead.

Aschroet triaged this task as Normal priority.Aug 13 2015, 1:27 PM

standard namespaces Page = ns:250

azWS
bgWS
bsWS
csWS
cyWS
fiWS
foWS
isWS
glWS
jaWS
koWS
liWS
mkWS
orWS
sahWS
skWS
srWS
thWS
trWS
ukWS
yiWS
zh-min-nanWS

That should be all (deWS being done by deWS as previously stated).

If I have missed any, then please speak up (reopen the task).

Billinghurst closed this task as Resolved.Aug 13 2015, 1:45 PM

all done, so I believe

Billinghurst reopened this task as Open.Aug 13 2015, 3:30 PM

@Aschroet Ugh, with -purge it fails (which I was recommended to use). It works fine without -purge, and I will have a go at those tomorrow. It will be a lot quicker as they will be in my bash history and just need a small tweak.

@Billinghurst, fine for me. I will return on Monday. Maybe i can then review the results. Thanks for your support.

Having to give the bot a status as even though it is only doing null edits, it is getting caught by rate limiter. Temporarily assigned bot rights

[01:22] <StewardBot> Billinghurst changed global group membership for Wikisource-bot from (none) to global-bot with the following comment: temporary permission to manage [[Phabricator:T108799]]

Not going to work too many of these wikis are not global bot wikis, new plan required

Not going to work too many of these wikis are not global bot wikis, new plan required

Run the bot on those wikis regardless of their opt-out state. It's a null edit.

Billinghurst closed this task as Resolved.Aug 14 2015, 3:53 PM

They are off and running. Fortunately I was able to push them through with billinghurst rights, so hopefully this will not be an issue for anyone's recent changes.

@Aschroet I kicked deWS too, so hopefully you can check your sets on Monday

We might do well to think about whether there is an advantage to a "Wikisource global group" where we can set bots to have rights assigned across the set of wikisources. That would be another discussion for another day, not here, not now.

Billinghurst added a comment.EditedAug 15 2015, 12:55 AM

I have achieved the noisiest cleanup in history. <sigh>

@Billinghurst, i heared about it. How is the status of the Bot run?

Paulis added a subscriber: Paulis.Aug 16 2015, 8:01 PM

My Bot working from the sides between D and K, then the job is done. I hope to finish it tomorrow.

@Billinghurst, dewikisource seems to be fine now. How is the general status? Do we have all Wikisources finshed?

[Apologies for the tardy response. Life ... priority tasks ... labs server reboots ... plus issues with pywikibot. Needed people to help me explore and to explain, and some code issues from the weekend that coincided with this. Marvellous fun!]

Situation is that I suspended the jobs on the grids (eventually worked out how to do that). The jobs are unfinished, and I have to kill them, and resume new tasks from where they got to. Once I have applied the temporary bot permission changes for wikisource-bot (and not my primary account this time); a slower, and picking up where the suspensions took place.

Learning for me about pywikibot and permissions. Pywikibot sets its permissions for how it edits at the beginning of the run, and will ignore all system changes of rights (except block) after that, so when I tried to apply rights changes through the system during the edits, they were unable to affect the editing running. Apologies about all that noise that occurred.

With regard to the issue that cause the visible edits, there seems to be cases of wiki pages that had terminating spaces or page returns, presumably from a point in time, and we have had a wiki code change that strips those terminating characters for now, so the null edits became cleansing edits, even by fact of the underlying mediawiki code. My running of 1000 pages at enWS and 2 x 100 pages at other wikis clearly didn't find any examples like that, so the null edits worked fine in that situation, no indicator of troubles. Only later when it ran into batches of edits that needed fixing that it was a problem. :-(

I hope that explains what happened, and I think that it will take me about a week to get through these tasks for all wikis, though the slowest will be enwikisource.

Good news is that the task did work in populating the use of images on the respective File: pages.

Billinghurst reopened this task as Open.Aug 20 2015, 12:34 PM

the batch of wikis needing touch of files is still incomplete

Glaisher added a subscriber: Glaisher.

This is somewhat held up due to the fact that pywikibot has an issue that touch seems to not operate in bot mode, and where an edit resulting occurs due to a change in the ProofreadPage/MediaWiki space it then floods RecentChanges.

Restricted Application added a subscriber: pywikibot-bugs-list. · View Herald TranscriptSep 23 2015, 6:56 AM
This comment was removed by Billinghurst.

@Billinghurst, seems that that blocking task has been closed. Are we now able to continue with this ticket?

@Aschroet I have some plans to test it today or tonight. I just need to

work around dodgy connection issues. Then, yes! We can batch to run from

where we left off.

This special page doesn't include usage through the ProofreadPage extension at Wikisource. Usage at Wiktionary may be incomplete, as Wiktionary has case-sensitive first characters and lowercase uses are not listed.

This warning need a fix now? https://commons.wikimedia.org/wiki/Special:GlobalUsage

Billinghurst added a comment.EditedOct 12 2015, 12:36 PM

Tests for daWS successful and I am just now submitting it to the grid to finish. If deWS wishes we set bot rights Wikisource-bot and run through the Page:ns to touch the pages, or they can utilise another bot.

The last page that I purged as Billinghurst was

Seite:Bötjer Basch.djvu/041

I have just ran two further checks each of 30 pages before I hit the API rate limiter. No saves in the batch. <shrug> I will try some more though it will not be too indepth with this limiting.

(re)done
python pwb.py touch.py -lang:xx -family:wikisource -namespace:Page -start:\! -pt:0 -botflag

  • arWS
  • asWS
  • azWS
  • beWS
  • bgWS
  • bnWS
  • bsWS
  • csWS
  • cyWS
  • daWS
  • elWS
  • eoWS
  • etWS
  • faWS
  • fiWS
  • foWS
  • glWS
  • guWS
  • hrWS
  • heWS
  • caWS
  • brWS
  • hyWS

Running ...

  • enWS
  • mulWS

Noting that several of these wikis had zero pages in the Page: ns :-/

More to come. If the big wikisources want this run please give wikisource-bot bot rights. (@Aschroet for deWS)

@Billinghurst, as i understand the local admin Paulis did already touch all pages by a bot. @Paulis, can you please confirm it?

@Aschroet & @Billinghurst : I can confirm, de-ws done (since 18.8.15)

Billinghurst closed this task as Resolved.Oct 17 2015, 12:09 PM

Good-o. Let me know if there is any need for assistance.

  • esWS
  • idWS
  • isWS
  • jaWS
  • knWS
  • liWS
  • ltWS
  • mkWS
  • mrWS
  • nlWS
  • orWS
  • saWS
  • skWS
  • srWS
  • saWS
  • sahWS
  • taWS
  • teWS

zh-min-manWS

  • slWS
  • thWS
  • trWS
  • viWS
  • yiWS
  • vecWS
  • ukWS
  • mlWS
  • plWS
  • ptWS
  • esWS
  • zhWS
  • noWS
  • ruWS

still running

  • enWS
  • itWS
  • frWS
  • svWS

deWS previously done.

All complete or running, nd I expect frWS and enWS to be running for many more days. Calling it closed.

@Billinghurst, thank u very much for your effort to solve this issue. Maybe later when T97613 is solved entirely we also need to touch Index pages as well. But for now i agree to consider this task as closed.

GOIII moved this task from Backlog to Done on the ProofreadPage board.Jun 12 2016, 4:02 AM
Xqt changed the status of subtask T113450: pywikibot-touch.py needs to operate in bot mode from Declined to Resolved.Feb 3 2019, 10:00 AM