Periodical run of currently disabled special pages
Closed, ResolvedPublic

Description

Author: aliter

Description:
We're now running on Wanted Pages information that is one year old. If that's unavoidable it might be better to remove the option from the Special Pages list. I would prefer to see an occasional update, however. If daily updates are too heavy a burden, maybe monthly or quarterly?


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=47470
https://bugzilla.wikimedia.org/show_bug.cgi?id=55943

Peachey88 added a comment.Via ConduitJan 6 2009, 3:27 AM
  • Bug 16898 has been marked as a duplicate of this bug. ***
bzimport added a comment.Via ConduitJan 6 2009, 4:17 AM

danny.b wrote:

*** Bug 16898 has been marked as a duplicate of this bug. ***

bzimport added a comment.Via ConduitSep 20 2009, 11:26 PM

danny.b wrote:

*** Bug 15714 has been marked as a duplicate of this bug. ***

bzimport added a comment.Via ConduitSep 23 2009, 10:19 PM

danny.b wrote:

*** Bug 20786 has been marked as a duplicate of this bug. ***

bzimport added a comment.Via ConduitNov 19 2009, 12:32 AM

aliter wrote:

Considering the number of separate reports, this is not a trivial issue, changing severity to minor.

Raymond added a comment.Via ConduitSep 13 2010, 11:53 AM
  • Bug 25162 has been marked as a duplicate of this bug. ***
Raymond added a comment.Via ConduitSep 13 2010, 11:54 AM
  • Bug 25098 has been marked as a duplicate of this bug. ***
hashar added a comment.Via ConduitJan 18 2011, 5:59 PM

Special pages disabled for all wikis :

Ancientpages
CrossNamespaceLinks
Deadendpages
Fewestrevisions
Mostlinked
Mostrevisions
Wantedpages

We might as well hide them / disable them.

MarkAHershberger added a comment.Via ConduitJan 19 2011, 7:48 PM

Ashar, can you make this change?

Looks like http://en.wikipedia.org/wiki/Special:AncientPages is already disabled ...

MarkAHershberger added a comment.Via ConduitJan 29 2011, 1:38 AM
  • Bug 26501 has been marked as a duplicate of this bug. ***
hashar added a comment.Via ConduitFeb 1 2011, 6:56 PM

Reopening this since the root cause is not fixed.

Either:

  1. WMF sysadmin setup the periodical refresh (once per week?)
  2. Developer code something to hide the disabled special page and stop puzzling users with never updated caches.
bzimport added a comment.Via ConduitMar 14 2011, 1:04 PM

danny.b wrote:

*** Bug 16871 has been marked as a duplicate of this bug. ***

Jarry1250 added a comment.Via ConduitMar 22 2011, 11:05 PM

It was mentioned on bug #14786 that WantedPages could be not all that resource intensive. Might periodic updates not still remain an option, then, even on the larger wikis (including en.wp)?

minatahatsune added a comment.Via ConduitMar 23 2011, 5:09 AM

Yeah, including vi.wp, too.

Peachey88 added a comment.Via ConduitMay 4 2011, 6:27 AM
  • Bug 28710 has been marked as a duplicate of this bug. ***
Catrope added a comment.Via ConduitMay 15 2011, 9:51 AM
  • Bug 9265 has been marked as a duplicate of this bug. ***
MarkAHershberger added a comment.Via ConduitJun 17 2011, 11:45 PM

pdhanda is supposed to figure out a schedule to get these updates to run more often .... we also plan on being updating the page to say the NEXT time it will run.

MarkAHershberger added a comment.Via ConduitJun 30 2011, 3:57 PM

Tim has said some queries should never be run. I've asked him to add a list here, but I suspect they are the same ones that are listed in Comment #18.

Also, need to find someone in Ops to schedule runs of the other queries since pdhanda probably won't have a chance to do it.

Reedy added a comment.Via ConduitAug 18 2011, 1:06 PM
  • Bug 30439 has been marked as a duplicate of this bug. ***
Bennylin added a comment.Via ConduitSep 26 2011, 12:37 PM

(In reply to comment #9)

Not only Special:WantedPages needs to be updated because of being disabled ->
changing the summary.

I guess monthly update of all of them on all projects spreading the work
through entire month (say each day run update of currently disabled special
pages on 1/30 of wikis or some other division) could work.

It wouldn't be too expensive at once (server side need) and it's pretty enough
for users - better than the current nothing (client side need). Both satisfied.

Less expensive pages currently disabled could be ran weekly on the same
principle...

Also, small wikis, as been said in comments above, have these pages very
inexpensive, so they could be ran daily as they used to be. Maybe some weighing
according to the # of pages could work as well, say (eg.) sites with 1-10k
pages daily, 10k-100k weekly, 100k-300k bi-weekly, 300k+ monthly or so...

Any other solution than permanent disabling is better.

Agree with Danny. Should define "small wikis" first.

Krinkle added a comment.Via ConduitOct 28 2011, 9:48 PM

Even without special "more often" treatment, having all wikis treated as big wikis is good enough too. Anything is better than the current situation.

Last update October 2009...

bzimport added a comment.Via ConduitMay 12 2012, 9:03 PM

Thehelpfulonewiki wrote:

Any update?

Matanya added a comment.Via ConduitJul 23 2012, 7:25 AM

I think we can just remove those pages. The benefit is too low in compare to what we gain.

Jarry1250 added a comment.Via ConduitJul 23 2012, 8:40 AM

(In reply to comment #34)

I think we can just remove those pages. The benefit is too low in compare to
what we gain.

I can't agree with that at the moment. We are yet to have a comment here about which queries are do-able for smaller wikis; yet more might be optimisable enough to run on larger wikis. There's no way I can advocate sweeping these pages under the rug until that issue is looked into.

Bennylin added a comment.Via ConduitJul 23 2012, 9:54 AM

For those who are active in smaller wikis who wishes to have updated stats, you might as well download the dump, run it locally, and publish the result to your community. I did it that way on id.wp. Granted it's not gonna be weekly, and might not be feasible for larger ones, but I see that nobody shared this before, so it's an option.

bzimport added a comment.Via ConduitAug 26 2012, 1:16 PM

lambdav wrote:

Blocker bug cannot be minor.

Krinkle added a comment.Via ConduitAug 26 2012, 2:40 PM
  • Bug 1861 has been marked as a duplicate of this bug. ***
Krinkle added a comment.Via ConduitAug 26 2012, 2:46 PM

Currently disabled (last updated October 2009):

  • DeadendPages
  • AncientPages
  • LonelyPages
  • UncategorizedCategories
  • WantedPages
  • WantedTemplates

Currently disabled (last updated 2007):

  • FewestRevisions
MZMcBride added a comment.Via ConduitAug 26 2012, 3:23 PM

(In reply to comment #35)

I can't agree with that at the moment. We are yet to have a comment here about
which queries are do-able for smaller wikis; yet more might be optimisable
enough to run on larger wikis. There's no way I can advocate sweeping these
pages under the rug until that issue is looked into.

I split this issue out to bug 39667 ("Divide wikis into database lists by approximate size for performance engineering"). Punishing small wikis due to their larger brethren has never made sense. This needs to be fixed.

bzimport added a comment.Via ConduitAug 26 2012, 7:53 PM

servien wrote:

If anyone can answer my, that would be great. If it can be turned on, that

would be even better.

Special:Wantedpages hasn't been updated since 2009 on the Low Saxon Wikipedia
(nds-nl), what is the reason for that, is it possible to turn this feature on?
Someone has compiled a list in the past for the Low Saxon Wikipedia using a
bot, unfortunately this user isn't active anymore. Is there someone who can
help me with this?

Nemo_bis added a comment.Via ConduitAug 26 2012, 8:07 PM

(In reply to comment #41)

Special:Wantedpages hasn't been updated since 2009 on the Low Saxon Wikipedia
(nds-nl), what is the reason for that, is it possible to turn this feature on?

Nobody really knows.

Someone has compiled a list in the past for the Low Saxon Wikipedia using a
bot, unfortunately this user isn't active anymore. Is there someone who can
help me with this?

https://wiki.toolserver.org/view/DBQ

bzimport added a comment.Via ConduitOct 11 2012, 5:08 PM

dcduring wrote:

For English Wiktionary I would be very happy if this were run monthly or quarterly limited to pages in principal namespace, wanted from pages in principal namespace. An annual run to and from all spaces might be sufficient for other maintenance, IMO.

jayvdb added a comment.Via ConduitOct 12 2012, 3:20 AM

Once a year would be good for large projects. They could then build wikiprojects to manage the task of creating all important missing articles.

Amire80 added a comment.Via ConduitOct 17 2012, 10:36 AM

So, here's another specific request from the Kyrgyz Wikipedia:
They find https://ky.wikipedia.org/wiki/Special:DeadendPages useful for improving their content and encouraging participation. Unfortunately, it was last updated in 2009.

Is there really a significant performance problem with updating these pages? I'm going through the comments here, and unless I'm missing something, no actual examples of performance issues were given.

Bennylin suggested to "define small wikis". Maybe instead of defining them we could just put some time/CPU/RAM limit on the query?

Nemo_bis added a comment.Via ConduitNov 18 2012, 11:23 PM

Bug 39667 is progressing, but in the meanwhile I've tried to suggest an alternative approach at gerrit change 33713.
The "idea" would be to update only one page per cluster at a time, (one or) two times per year: but for all wikis.

Nemo_bis added a comment.Via ConduitDec 7 2012, 1:14 PM

Further (non-)updates: I sent an e-mail to wikitech-l on this about a week ago but got no feedback so far (I never have luck with wikitech-l emails ;) ).
http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/65463/

Aklapper added a comment.Via ConduitJan 4 2013, 3:09 PM

(In reply to comment #47)

Further (non-)updates: I sent an e-mail to wikitech-l on this about a week
ago
but got no feedback so far (I never have luck with wikitech-l emails ;) ).

Try again & make clear why it's an important issue & what you want to know? :)

Jarry1250 added a comment.Via ConduitJan 4 2013, 3:16 PM

Andre, can you chase this up with Ops? Open a RT ticket or something?

https://gerrit.wikimedia.org/r/#/c/33713/ needs review and/or merging. Only Ops can flick that switch, which is the bottleneck here.

bzimport added a comment.Via ConduitJan 4 2013, 4:32 PM

dcduring wrote:

Would it be possible to do at least annual runs of WantedPages for principal namespace on en.wiktionary.org?

All the redlinks are annoying to look at. Some are valid terms that we would want to include. Others just need to be unwikilinked because they would not meet our standards for inclusion. I can understand why the smaller wikis get more benefit for the resource cost, but the large wikis benefit as well. Even if some dump processing could, in principle, generate the same content, inclusive MW runs are good ways of catching idiosyncratic uses of the wiki, some of which are undesirable.

Would you need a vote from en.wiktionary to support this?

JackPotte added a comment.Via ConduitJan 4 2013, 4:33 PM

and fr.wikt please ;)

Bawolff added a comment.Via ConduitJan 4 2013, 4:57 PM

(In reply to comment #50)

Would you need a vote from en.wiktionary to support this?

That's unnecessary. We know everyone wants the special pages being run and I'm sure that the moment we're sure they can be safely run, they will be.

Reedy added a comment.Via ConduitJan 17 2013, 6:21 PM
  • Bug 15755 has been marked as a duplicate of this bug. ***
Malafaya added a comment.Via ConduitJan 25 2013, 11:30 AM

The special pages haven't updated in 5 days at pt.wikt. Is it related to the moving of servers?

Reedy added a comment.Via ConduitJan 25 2013, 4:59 PM

(In reply to comment #54)

The special pages haven't updated in 5 days at pt.wikt. Is it related to the
moving of servers?

Seems quite likely.

(In reply to comment #55)

Check this link, for example:
https://pt.wiktionary.org/wiki/Especial:Categorias_pedidas

This one is 6 days old:
https://nl.wikipedia.org/wiki/Speciaal:GevraagdeCategorie%C3%ABn

Those aren't disabled special pages. This is not the correct bug.

Aklapper added a comment.Via ConduitJan 25 2013, 5:13 PM

Malafaya: As this is not about "Currently disabled special pages", I've copied comment 54 and comment 55 to bug 44348.

Reedy added a comment.Via ConduitJan 26 2013, 8:12 PM

Created attachment 11692
Current log from update special pages log

It's missing the temporary disabled querypages for frwiki, but should be fairly indicative for the time being

Attached: updateSpecialPages.log

Reedy added a comment.Via ConduitJan 30 2013, 4:30 PM

reedy@fenari:~$ time mwscript updateSpecialPages.php enwiki --override | tee ~/public_html/enwikispecialpages.log
Statistics 1m completed in 21.33s
ValidationStatistics completed in 7.72s
Ancientpages got 1000 rows in 3h 33m 10.58s
BrokenRedirects got 31 rows in 9m 12.57s
Deadendpages got 617 rows in 2h 26m 22.34s
Disambiguations got 1000 rows in 13m 49.71s
DoubleRedirects got 223 rows in 4m 45.21s
FileDuplicateSearch cheap, skipped
LinkSearch cheap, skipped
Listredirects got 1000 rows in 0.26s
Lonelypages got 1000 rows in 4m 16.68s
Longpages cheap, skipped
MIMEsearch got 0 rows in 0.00s
Mostcategories got 1000 rows in 1h 12m 31.90s
Mostimages got 1000 rows in 11m 15.93s
Mostinterwikis got 1000 rows in 10m 9.26s
Mostlinkedcategories cheap, skipped
Mostlinkedtemplates got 1000 rows in 4h 5m 54.14s
Mostlinked got 1000 rows in 18h 24m 55.82s
Mostrevisions got 1000 rows in 14h 48m 27.91s
Fewestrevisions got 1000 rows in 13h 58m 12.41s
Shortpages cheap, skipped
Uncategorizedcategories got 216 rows in 31m 28.03s
Uncategorizedpages got 125 rows in 29m 38.84s
Uncategorizedimages got 57 rows in 1m 13.81s
Uncategorizedtemplates got 1000 rows in 1.84s
Unusedcategories got 1000 rows in 18.40s
Unusedimages got 1000 rows in 18.55s
Wantedcategories got 1000 rows in 17m 49.86s
Wantedfiles got 1000 rows in 17m 57.45s
Wantedpages got 1000 rows in 7h 17m 20.02s
Wantedtemplates got 1000 rows in 1h 41m 22.78s
Unwatchedpages got 1000 rows in 1m 49.07s
Unusedtemplates got 1000 rows in 39.74s
Withoutinterwiki got 1000 rows in 5m 41.06s

real 4210m17.172s
user 0m0.920s
sys 0m0.112s

^ Nearly 3 days to run

Nemo_bis added a comment.Via ConduitJan 30 2013, 4:38 PM

Created attachment 11715
Update of all reports, also disabled, on de.wiki

Reedy also did de.wiki.
I commented on gerrit change 33713: The worst case is mostlinked, 18h on en.wiki. It got to completion without problems and the hit slave (in pmtpa, now idle after eqiad migration) had only few seconds of lag for few minutes every now and then, no significant load. This seems safe enough to merge, even significantly more cautious than needed.

Attached: dewikispecialpages.log

bzimport added a comment.Via ConduitJan 30 2013, 5:47 PM

dcduring wrote:

  1. Can I take WP's experience as a reasonable indication of the relative oost of various special pages in en.wikt, which probably has higher link density and has many widely transcluded templates, but has mostly short pages?
  1. We have been having some discussions, which seem to suggest that these runs are not so useful as to warrant high frequency.
  1. Furthermore, we seem to need more discrimination in, for example, WantedPages, which we seem to be able to provide by working the dump. Even such reports don't seem to be needed very frequently, as we don't yet have the capability to break the list down by language, which would make it much more useful.
bzimport added a comment.Via ConduitFeb 2 2013, 5:01 PM

lambdav wrote:

From the log about times taken to update page on en.wiki, tasks can be scheduled smartly, by using previous time read from log files.

For a total period of 2 months:

  • if it can run in less than 10m -> once per 2 day
  • if it can run in less than 1h -> once per 2 weeks
  • if it can run in less than 5h -> once per month
  • longer -> once per 2 months

I computed a time charge of about 7%.
I can attach the spreadsheet file I used (ODS format) if you wish.

Also, it would be good to review algorithms used to compute theses pages, and maybe databases need refactoring to optimize such computations.

Reedy added a comment.Via ConduitFeb 2 2013, 10:25 PM

(In reply to comment #62)

Also, it would be good to review algorithms used to compute theses pages, and
maybe databases need refactoring to optimize such computations.

In most cases, it's probably not worth the overhead/cost of doing the refactorings.

MZMcBride added a comment.Via ConduitMar 14 2013, 4:32 AM

(In reply to comment #3)

For small projects like this one, generating such a page doesn't take more
than a few seconds (1.69 sec in this case). Maybe special pages like these
should be enabled in the updateSpecialPages.php, but only for a selected
number of wikis (actually, most of them would fit here).

Related:

  • bug 43668: Re-enable disabled Special pages on small wikis (wikis in small.dblist)
  • bug 46094: Re-enable disabled Special pages on medium wikis (wikis in medium.dblist)
Nemo_bis added a comment.Via ConduitApr 23 2013, 7:16 AM
  • Bug 47470 has been marked as a duplicate of this bug. ***
Krenair added a comment.Via ConduitMay 21 2013, 7:06 PM
  • Bug 48678 has been marked as a duplicate of this bug. ***
liangent added a comment.Via ConduitJun 16 2013, 9:02 AM

Given this bug can't be resolved in short time, I made a version on Tool-Lab:

http://tools.wmflabs.org/liangent-php/index.php/zhwiki~~wgUseDatabaseMessages=0?title=Special:%E6%96%AD%E9%93%BE%E9%A1%B5%E9%9D%A2&uselang=en

Let me know if your wiki wants this too.

Nemo_bis added a comment.Via ConduitJun 16 2013, 9:04 AM

(In reply to comment #67)

Given this bug can't be resolved in short time,

Actually, Asher gave green light for the patch, Reedy amended it and it could be merged any time soon. :)

gerritbot added a comment.Via ConduitJul 30 2013, 8:50 PM

Change 33713 merged by Dzahn:
(bug 15434) Periodical run of currently disabled special pages

https://gerrit.wikimedia.org/r/33713

Nemo_bis added a comment.Via ConduitJul 31 2013, 11:38 AM

Created attachment 13028
crontabs on the maintenance server terbium, including the new ones for this bug

So, this bug can now hopefully be considered (mostly) fixed, with its current summary, after mutante approved and fixed the change above.
In detail, the special pages 1) AncientPages, 2) DeadendPages, 3) MostLinked, 4) MostRevisions, 5) WantedPages, 6) FewestRevisions will be updated twice a year on every wiki as follows:
a) page 1) in 1st and 7th month of the year, page 2) in 2nd and 8th month etc.,
b) starting at 1 UTC on each of the days from the 11th to the 17th of the month, where on the 11th are the wikis on database s1, on 12nd s2 etc. as listed on https://noc.wikimedia.org/dbtree/ .
(You can see the crontabs attached, as provided by mutante.) In short we should first see DeadendPages (which is among the slowest) updated for en.wiki on August 11, on it.wiki, pl.wiki etc. the next day and so on.

The next steps, in order, are:

  1. keep an eye on the first updates to see whether they are successful and if they overload the servers too much, in which case they may be disabled;
  2. if all goes well, make the frequency higher or much higher, e.g. monthly (as Tim put it, "If they don't break the site, then why not run them every week?"), or decide that this is enough;
  3. try and add updates for the pages disabled only on en.wiki, fr.wiki and perhaps wikidata (see https://git.wikimedia.org/blob/operations%2Fmediawiki-config.git/351a084f4a26fc7daeeccadedba48706f251664a/wmf-config%2FInitialiseSettings.php#L9308).

This bug should be kept open at least till (1), so for a couple weeks more; 2-3 may be split to other bugs, but ideally we'll be more confident with testing, they'll follow in short order and we'll be able to close this bug to our complete satisfaction.

The queries will happen against the new databases in Ashburn, if I understand correctly, so let's thank (and wish in) the power of the new datacentre. Kudos to all the WMF people who helped transform my rough proposal in something real, including mutante, Reedy, Asher Feldman, Peter Youngmeister, Tim Starling, Ariel Glenn.

Attached: file_15434.txt

Aklapper added a comment.Via ConduitAug 15 2013, 3:36 PM

(In reply to comment #70 by Nemo)

So, this bug can now hopefully be considered (mostly) fixed, with its current
summary, after mutante approved and fixed the change above.

Nemo: Let's close as RESOLVED FIXED then?

Nemo_bis added a comment.Via ConduitAug 15 2013, 4:35 PM

(In reply to comment #71)

(In reply to comment #70 by Nemo)
> So, this bug can now hopefully be considered (mostly) fixed, with its current
> summary, after mutante approved and fixed the change above.

Nemo: Let's close as RESOLVED FIXED then?

I'm planning to check the results of the crontab later today.

Nemo_bis added a comment.Via ConduitAug 15 2013, 7:59 PM

(In reply to comment #73)

Aren't the pages

No, only DeadendPages, on s1-5 as of today. However the update didn't work on any of the wikis I checked on those DBs, whether big or small. :[
Can a shell user please check the logs on /home/mwdeploy/updateSpecialPages/ ?

gerritbot added a comment.Via ConduitAug 15 2013, 9:43 PM

Change 79279 had a related patch set uploaded by Nemo bis:
Make SpecialPages Titlecase in misc::maintenance::updatequerypages

https://gerrit.wikimedia.org/r/79279

gerritbot added a comment.Via ConduitAug 16 2013, 11:18 AM

Change 79279 merged by ArielGlenn:
Make SpecialPages Titlecase in misc::maintenance::updatequerypages

https://gerrit.wikimedia.org/r/79279

Nemo_bis added a comment.Via ConduitAug 17 2013, 6:54 AM

(In reply to comment #76)

Change 79279 merged by ArielGlenn:
Make SpecialPages Titlecase in misc::maintenance::updatequerypages

https://gerrit.wikimedia.org/r/79279

The fix has been approved but didn't go live on the server (puppet had been disabled), so we have to wait till next month (September 11-17) for updates to Special:MostLinked, to know how all this works.

bzimport added a comment.Via ConduitSep 5 2013, 11:11 PM

danny.b wrote:

Coming from https://meta.wikimedia.org/wiki/Tech/News/2013/34

Half a year on all wikis? Quite a nonsense. :-/

Small wikis with hundreds or thousands of articles can be simply updated much more often.

Also, half a year update to very often changing special pages such as Uncategorized*, Double/Broken redirs etc. doesn't make a sense.

Rather disable and hide such special pages completely than provide obsolete results for half a year which will only confuse people such as they do now.

Why it wasn't scaled as suggested in proposal in comment #9?

Nemo_bis added a comment.Via ConduitSep 14 2013, 11:36 PM

(In reply to comment #77)

The fix has been approved but didn't go live on the server (puppet had been
disabled), so we have to wait till next month (September 11-17) for updates
to
Special:MostLinked, to know how all this works.

It seems it's working, but we need to be cautious: I checked en.wiki for s1, it.wiki for s2, fr.quote for s3, commons for s4 and they have been updated.
The ganglia graphs show that slave lag was not very significantly affected (worst case probably s2 with some 1.5 s lag on one server; then Commons with an average 1.4 s over two hours on one server), while other metrics were more. In order:
https://ganglia.wikimedia.org/latest/?r=custom&cs=09%2F11%2F2013+00%3A00&ce=9%2F12%2F2013+12%3A00&tab=ch&vn=&hreg[]=db10%2843|49|50|51|52%29
https://ganglia.wikimedia.org/latest/?r=custom&cs=09%2F12%2F2013+00%3A00&ce=9%2F13%2F2013+12%3A00&tab=ch&vn=&hreg[]=db10%2802|09|18%29
https://ganglia.wikimedia.org/latest/?r=custom&cs=09%2F13%2F2013+00%3A00&ce=9%2F14%2F2013+12%3A00&tab=ch&vn=&hreg[]=db10%2803|10|35%29
https://ganglia.wikimedia.org/latest/?r=custom&cs=09%2F14%2F2013+00%3A00&ce=9%2F15%2F2013+12%3A00&tab=ch&vn=&hreg[]=db10%2804|11|20%29
(more eyeballs and conclusions/interpretations appreciated).

If, as it seems, the current setup is not going to kill the cluster :) , I'd proceed with some patches for step 2 or 3 as per comment 70 in a few days.

gerritbot added a comment.Via ConduitSep 17 2013, 8:48 PM

Change 84632 had a related patch set uploaded by Nemo bis:
Periodical run of remaining currently disabled special pages on en.wiki

https://gerrit.wikimedia.org/r/84632

gerritbot added a comment.Via ConduitSep 17 2013, 8:51 PM

Change 84635 had a related patch set uploaded by Nemo bis:
Periodical run of disabled special pages: make updates monthly

https://gerrit.wikimedia.org/r/84635

Nemo_bis added a comment.Via ConduitSep 17 2013, 8:56 PM

(In reply to comment #79)

If, as it seems, the current setup is not going to kill the cluster :) , I'd
proceed with some patches for step 2 or 3 as per comment 70 in a few days.

The other databases look even more bored, so I submitted two more patches to make updates for the 6 reports in comment 70 monthly and to add updates for the 6 reports disabled on en.wiki only (with the current frequency i.e. every 6 months).
They will probably sit in gerrit for a while... or perhaps not, we'll see. I'm told the new database guru is Sean Pringle, adding to cc. :)

Springle added a comment.Via ConduitSep 17 2013, 9:50 PM

I saw when these queries ran but didn't know what they were at the time. Thanks for cc'ing me.

Don't make the mistake of thinking the databases are bored ;-) That's a slippery slope. However, I'm ok with these jobs going ahead providing the slave lag doesn't suffer unduly, and the innodb purge activity has enough chance to keep-up/catch-up between jobs. The latter may mean using non-contiguous day-of-month on cron jobs hitting the same shard.

But let's see how it goes. I'll merge it.

gerritbot added a comment.Via ConduitSep 17 2013, 9:52 PM

Change 84632 merged by Springle:
Periodical run of remaining currently disabled special pages on en.wiki

https://gerrit.wikimedia.org/r/84632

gerritbot added a comment.Via ConduitSep 17 2013, 9:54 PM

Change 84635 merged by Springle:
Periodical run of disabled special pages: make updates monthly

https://gerrit.wikimedia.org/r/84635

Malafaya added a comment.Via ConduitSep 18 2013, 9:13 AM

Does this somehow fix the problem of special pages (all of them) currently not being automatically refreshed?

Nemo_bis added a comment.Via ConduitSep 18 2013, 10:04 AM

(In reply to comment #86)

Does this somehow fix the problem of special pages (all of them) currently
not
being automatically refreshed?

That's a separate issue with the separate cronjob on non-disabled special pages.

bzimport added a comment.Via ConduitOct 5 2013, 2:41 AM

William915 wrote:

(In reply to comment #78)

Coming from https://meta.wikimedia.org/wiki/Tech/News/2013/34

Half a year on all wikis? Quite a nonsense. :-/

Small wikis with hundreds or thousands of articles can be simply updated much
more often.

Also, half a year update to very often changing special pages such as
Uncategorized*, Double/Broken redirs etc. doesn't make a sense.

Rather disable and hide such special pages completely than provide obsolete
results for half a year which will only confuse people such as they do now.

Why it wasn't scaled as suggested in proposal in comment #9?

I totally agree. Please make a more frequent updatr on small wikis.
.

Ata added a comment.Via ConduitOct 5 2013, 8:13 PM

ato4ka wrote:

I came here from Wikisource. Special:WantedPages in enwikisource was last updated 04:20, 16 October 2009.
https://en.wikisource.org/w/index.php?title=Special:WantedPages
Unbelievable.

Nemo_bis added a comment.Via ConduitOct 5 2013, 11:01 PM

As said above, the update is now set to be monthly. In 5 days from now we should see the stream of updates.

MZMcBride added a comment.Via ConduitOct 6 2013, 2:09 AM

(In reply to comment #89)

I came here from Wikisource. Special:WantedPages in enwikisource was last
updated 04:20, 16 October 2009.
https://en.wikisource.org/w/index.php?title=Special:WantedPages
Unbelievable.

I wanted to refute this incredulousness with stats about how large the English Wikisource is, but it turns out it's not very large. ;-)

MariaDB [enwikisource_p]> select count(*) from pagelinks\G

  • 1. row *******

count(*): 8390968
1 row in set (8.53 sec)

MariaDB [enwikisource_p]> select count(*) from page\G

  • 1. row *******

count(*): 1457066
1 row in set (0.41 sec)

I know it's difficult to believe, but the maintenance Special pages situation _is_ improving, just very slowly. Unfortunately, enwikisource is considered a large database (cf. https://noc.wikimedia.org/conf/large.dblist), so even bug 46094 won't help here. Nemo's efforts should, though.

Andyrom75 added a comment.Via ConduitOct 6 2013, 7:27 AM

The special page that shows the most requested pages on https://it.wikivoyage.org (https://it.wikivoyage.org/wiki/Speciale:PagineRichieste) it hasn't been updated since November 2012, it's almost 1 year!!!

Can someone help us to update it? The situation is becoming ridiculos....

PS It's not the only one...

Nemo_bis added a comment.Via ConduitOct 20 2013, 10:18 AM

Update on the plan as per comment 70 and comment 82: the monthly update of all 6 disabled pages on all wikis worked, while for the update of the 6 additional en.wiki disabled special pages we have to wait for tomorrow ([[Special:MostLinkedTemplates]]).

To say what above I checked all the 12 pages on en.wiki and one of the 6 pages on a wiki of each cluster; I also gave a quick look to graphs like those linked in comment 79 and there wasn't anything worth noting, though I'm thinking of some improvements to the crontabs.

bzimport added a comment.Via ConduitOct 20 2013, 4:23 PM

danny.b wrote:

(In reply to comment #90)

As said above, the update is now set to be monthly. In 5 days from now we
should see the stream of updates.

It is 15 days from "now" and many pages are still not updated.

Broken Redirects, Double Redirects, Uncategorized Pages, Uncategorized Templates, Uncategorized Categories, Wanted Categories, Wanted Templates, Wanted Files, Most Linked Templates, @ cs wikis: 10 Sep

Most Linked Pages says 13 Oct & that update is OFF which seems weird to me.

These are just some, not necessarily all, because I was not checking all of them.

Nemo_bis added a comment.Via ConduitOct 20 2013, 4:31 PM

(In reply to comment #94)

It is 15 days from "now" and many pages are still not updated.

Broken Redirects, Double Redirects, Uncategorized Pages, Uncategorized
Templates, Uncategorized Categories, Wanted Categories, Wanted Templates,
Wanted Files, Most Linked Templates, @ cs wikis: 10 Sep

For the Nth time: that's bug 53227.

Most Linked Pages says 13 Oct & that update is OFF which seems weird to me.

It's weird only for those not reading the summary of this bug, which is about DISABLED special pages.

bzimport added a comment.Via ConduitOct 20 2013, 4:41 PM

danny.b wrote:

(In reply to comment #95)

(In reply to comment #94)
> It is 15 days from "now" and many pages are still not updated.
>
> Broken Redirects, Double Redirects, Uncategorized Pages, Uncategorized
> Templates, Uncategorized Categories, Wanted Categories, Wanted Templates,
> Wanted Files, Most Linked Templates, @ cs wikis: 10 Sep

For the Nth time: that's bug 53227.

nth? It is the very first time mentioned on this page! It was neither in blocking, depending nor see-also bugs, nor in any comment.

> Most Linked Pages says 13 Oct & that update is OFF which seems weird to me.

It's weird only for those not reading the summary of this bug, which is about
DISABLED special pages.

*I* made the summary of this bug 4,5 years ago :-P

If something is being updated even twice a year, it is not DISABLED, thus it must not be written there. (Actually creating new bug for this issue.)

MZMcBride added a comment.Via ConduitNov 17 2013, 2:54 AM

I believe this is relevant (from https://wikitech.wikimedia.org/w/index.php?title=Server_Admin_Log&oldid=89461):


November 17

01:35 Reedy: Killed updateSpecialPages and related processes on terbium
01:18 MaxSem: Killed a few long queries on db1007
01:08 MaxSem: db1007 is having tough times due to special page updates


Both seemed to agree that staggering the updates would sufficiently help.

Springle added a comment.Via ConduitNov 17 2013, 5:35 AM

Staggering things more would be fantastic. Batching even.

Note that as per db-eqiad.php Query::recache (which I think these count as) stuff has been pointed to the snapshot slaves, of which db1007 is one, and they are LB=1 for normal traffic.

So technically if those slaves get thrashed it's not a show stopper and we could simply dial back icinga noise for a while. Still...

Nemo_bis added a comment.Via ConduitNov 17 2013, 8:01 AM

Yep, as anticipated in comment 93 the monthly updates will need to be resorted so that they don't all happen on the same day. I'll submit a patch later today if nobody beats me at it.

gerritbot added a comment.Via ConduitNov 17 2013, 8:01 PM

Change 95889 had a related patch set uploaded by Nemo bis:
Make the monthly querypages updates not hit each cluster on the same day

https://gerrit.wikimedia.org/r/95889

gerritbot added a comment.Via ConduitNov 21 2013, 10:28 AM

Change 95889 merged by Springle:
Make the monthly querypages updates not hit each cluster on the same day

https://gerrit.wikimedia.org/r/95889

Nemo_bis added a comment.Via ConduitJan 9 2014, 2:16 PM

The plan as per comment 70 and comment 82 has been implemented. If someone can just have a crosswiki look at the special pages to ensure they're being updated correctly and at the ganglia graphs to check nothing is going to explode in our face soon, we can confirm it's all done.

Springle added a comment.Via ConduitJan 9 2014, 9:38 PM

These queries were fine in December after Nemo's last patch, plus Tim fixed a related load balancing bug allowing me to properly segregate them. So from DB perspective, seems ok.

Nemo_bis added a comment.Via ConduitJan 12 2014, 3:36 PM

I've checked all the pages listed at [[m:Special:PermaLink/7056706]] and they look well (updated in the last month), except for 5 out of the 6 reports disabled on en.wiki only... I'll check later what's happening with those and file separately, I call this fixed.

bzimport added a comment.Via ConduitJan 12 2014, 8:28 PM

dcduring wrote:

Thanks. I hope it stays fixed. There is some room for reduction in frequency or elimination of certain reports. If wikis had so kind of resource budget for maintenance reports, it would be possible for them to decide which reports were worth it for them.

OTOH, there are some items like Special:Unwatched Pages for which the maximum of 5,000 pages makes the report silly for a wiki like English Wiktionary. That is connected to the larger question of watchlist editing.

Nemo_bis added a comment.Via ConduitJan 13 2014, 7:59 AM

asked something like that around comment 47 but it's very hard: currently not even WMF and its own changes have anything like a "database stress" ""budget"". (Too long a discussion for this bug.)

(In reply to comment #107)

OTOH, there are some items like Special:Unwatched Pages for which the maximum
of 5,000 pages makes the report silly for a wiki like English Wiktionary.

Then let's try to make such pages useful. :) You have a few options:

  • extend "unwatchedpages" permission and let people use action=info on individual pages (simple config change),
  • file a consensual config change request to increase the number of results shown for that page (it's probably not too expensive) + a core bug to add such a configuration option,
  • propose some way to make that special page more useful for all wikis.

You don't need a budget to reason about what's important for your wiki and why, and clearly expose your use case/proposal in new bug reports. MediaWiki has so many features that it's often extremely hard for devs to understand on their own what's really important / has a true impact on any given wiki/community (it is even for wiki regulars on wikis they don't know). If you don't document, describe and argue for the needs of your wiki, nobody will do it for you. ;-)

bzimport added a comment.Via ConduitJan 13 2014, 1:50 PM

dcduring wrote:

Before embarking on one of your recommended courses of action:

Would it be possible for us to process the dump to get a list of unwatched pages?

At Wiktionary we usually focus only on "lemma" entries, ie, not on inflected forms like simple English plural nouns (much more common for languages like Latin), so the actual list of what we care most about is much shorter than the list of ALL unwatched pages. I could, with help from others at Wiktionary, run some Perl scripts to create the desired listing of relevant entries if "unwatched" or "watched" is an attribute on some XML dump file.

ViveLaRosiere added a subscriber: ViveLaRosiere.Via WebNov 27 2014, 3:59 PM

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.