Page MenuHomePhabricator

Issues with purgeUnusedProjects.php cron job on mwmaint1002 (Fri Oct 26)
Open, MediumPublic

Description

Cron <www-data@mwmaint1002> /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/pageassessments.dblist extensions/PageAssessments/maintenance/purgeUnusedProjects.php > /dev/null
[Fri Oct 26 20:42:12 2018] [hphp] [116138:7fcb2bd703c0:0:000001] [] SlowTimer [10555ms] at runtime/ext_mysql: slow query: SELECT /* Wikimedia\Rdbms\Database::select www-data@mwmain... */ DISTINCT( pa_project_id )  FROM `page_assessments

Please let us know what you think. Feel free to remove the "WMF-NDA" tag if you thing all information on this task is harmless.

Event Timeline

jijiki triaged this task as Medium priority.Oct 29 2018, 3:18 PM
jijiki created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 29 2018, 3:18 PM
chasemp added a subscriber: chasemp.EditedOct 29 2018, 4:29 PM

(meta note)

Just a heads up, this task is currently public and not restricted to WMF-NDA

(top left)

Open, Normal Public

If I edit task and set the visibility to only WMF-NDA then it will be visible to WMF-NDA only. Projects are confusing and the fact they can be used as objects across multiple functions: CC, projects, ACL objects, etc.

I am going to set this task as visible to WMF-NDA only to demonstrate :)

(now...top left)

Open, Normal WMF-NDA

chasemp changed the visibility from "Public (No Login Required)" to "WMF-NDA (Project)".Oct 29 2018, 4:29 PM
Banyek added a subscriber: Banyek.Dec 6 2018, 1:47 PM

I checked the query

SELECT /* Wikimedia\Rdbms\Database::select www-data@mwmain... */  DISTINCT( pa_project_id )  FROM `page_assessments

Based on the file /srv/mediawiki/dblists/pageassessments.dblist the enwiki, enwikivoyage, testwiki databases were used. Here are the results. I don't know which database would be queried this way but I was supposing vslow.

hostdbsectionresults
db1106enwikis13009 rows in set (3.30 sec)
db1113:3315enwikivoyages511 rows in set (0.07 sec)
db1123testwikis33 rows in set (0.04 sec)

enwiki wasn't fast, but 3 seconds is way better than 10

Banyek edited projects, added DBA; removed WMF-NDA.
Banyek changed the visibility from "WMF-NDA (Project)" to "Public (No Login Required)".
Banyek updated the task description. (Show Details)Dec 6 2018, 2:21 PM
Banyek added a project: User-Banyek.
Banyek moved this task from Triage to Backlog on the DBA board.Dec 6 2018, 2:34 PM
Banyek added a comment.Dec 6 2018, 5:04 PM

i'd like to add the owner of the script as a subscriber, but I don't know how to find who is it

i'd like to add the owner of the script as a subscriber, but I don't know how to find who is it

git blame can help

Restricted Application added a project: Community-Tech. · View Herald TranscriptDec 7 2018, 8:26 AM
Banyek added a comment.Dec 7 2018, 8:48 AM

@kaldari if you need any help for further debugging this, you can ask me

Banyek moved this task from Backlog to Wait on external on the User-Banyek board.Dec 7 2018, 8:55 AM
Banyek moved this task from Wait on external to FYI on the User-Banyek board.Dec 7 2018, 3:24 PM

@Banyek - Thanks for the ping. I don't think anything is unexpected here. This particular clean-up routine is expensive, which is why it was put in a cron job. Is there anything I can add to the script to indicate that? If we need to get it running faster than 3 seconds, let me know.

I think we should adjust the slow timer in a way of not to alert if the scripts runs for n seconds

This has evolved into fatals: T219935.

aezell added a subscriber: aezell.Apr 2 2019, 9:27 PM

This is interesting. T219935 seems to indicate that the query is now poorly formed instead of just expensive. I'm taking a look to see if I can see an obvious regression.