Page MenuHomePhabricator

Run updateArticleCount.php regularly
Closed, ResolvedPublic

Description

Because of the major priority T42009/T47269 (abuse of import feature) I ask to run InitStats maintenance script daily or at least weekly. The updateArticleCount.php script can be started with a cron job. After some imports the statistics page counter at the German Wikivoyage is now at 14.867 instead at about 12.600 ie the count shown is 15 % higher than the real one.

T48981 was only a temporary solution not a permanent one.

After the first update, dcljr made a list of the biggest variations in counts: https://meta.wikimedia.org/w/index.php?title=Wikimedia_News&oldid=11739239#March_2015


Version: wmf-deployment
Severity: normal

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:25 AM
bzimport set Reference to bz66867.
bzimport added a subscriber: Unknown Object (MLST).

Bug 45269 it's solved but not the other. I agree on weekly run also for https://it.wikivoyage.org and the other language version, until all the page count bugs have been completely solved.

Change 178170 had a related patch set uploaded (by Nemo bis):
Update cached article count monthly to avoid social unrest

https://gerrit.wikimedia.org/r/178170

Patch-For-Review

"social unrest" Nemo you are always able to crack me up :-D

Change 178517 had a related patch set uploaded (by Nemo bis):
updateArticleCount.php: use "vslow" DB by default, allow master

https://gerrit.wikimedia.org/r/178517

Patch-For-Review

Change 178517 merged by jenkins-bot:
updateArticleCount.php: use "vslow" DB by default, allow master

https://gerrit.wikimedia.org/r/178517

Change 178170 merged by Springle:
Update cached article count monthly to avoid social unrest

https://gerrit.wikimedia.org/r/178170

Nemo_bis renamed this task from Run updateArticleCount.php on German Wikivoyage weekly to Run updateArticleCount.php regularly.Jan 30 2015, 3:27 PM
Nemo_bis closed this task as Resolved.
Nemo_bis claimed this task.
Nemo_bis set Security to None.

This should be fixed, but I don't see big changes in statistics. It's possible that most backlog was already cleared by the stats update which confused sv.wiki, whatever triggered it.

Hi Nemo, are you sure that it works? I've just made a check on es:voy (where I know that they have recently experienced a counting problem), and I've found the following numbers:

  • Pages (from stats): 4.090
  • NS:0 count: 2.390 (including 555 redirect)

Could you check it?

I've forgotten to reopen it...

Change 188066 had a related patch set uploaded (by Nemo bis):
Actually run misc::maintenance::update_article_count

https://gerrit.wikimedia.org/r/188066

Patch-For-Review

Change 188066 merged by Ori.livneh:
Actually run misc::maintenance::update_article_count

https://gerrit.wikimedia.org/r/188066

I hadn't understood your comment. Indeed, stats said 4090 countable pages but there were only 2390 in total in ns0 (the only content namespace). Now we did the last missing step, closing.

The effects of the solution will be verified on the 29th (so in March...).

In November 2014 I asked to run updateArticleCount.php or a similar tool regularly. This is necessary on all Wikivoyage sites. This is because of the small number of articles and the unsolved Bug T42009. I think a weekly update would be sufficient.

Normally this can be done with a simple cron job. And it is difficult to me to understand why this is not done.

Nemo, the current schedule is still monthly (each 29th) or already weekly?

Glaisher removed a subscriber: Unknown Object (MLST).

I'm wondering whether we shouldn't change the day the script runs to the 28th, which _every_ month has...

Is the first thing that I've thought when I've read February 29th :-D

Change 205187 had a related patch set uploaded (by Alex Monk):
Run updateArticleCount on the 28th of each month, not 29th

https://gerrit.wikimedia.org/r/205187

Change 205187 merged by Dzahn:
Run updateArticleCount on the 21st of each month, not 29th

https://gerrit.wikimedia.org/r/205187

Apparently the script is about to run again (assuming "hour => 5," refers to UTC), so how long should it take to complete all the updates? Where do I look to see if it's finished running? (Apologies for using this task for my question, if that's frowned upon.)

Can someone verify that the script actually ran? Also, see the questions in my last post.