Every year we run a query on Quarry to retrieve the list eligible voters for that year's Supervisory Council elections (SC is fawiki's entity similar to enwiki's Arb Com). Given the recent changes in the databases, namely the addition of actor table, the query has gotten slower over time and now, it does not finish in the time limit Quarry imposes (which I think is 20 minutes).
While we can break this into multiple subqueries and combine them externally, we really prefer that it is a query that is publicly run on Quarry so that the fawiki community can transparently see its results.
The latest version of the query is below. Here is the logic in simple words:
- Given that the nomination period starts at 20201012000000
- Find all users whose account was created at least 3 months prior to the start of the nomination period
- Old accounts don't have an account creation log, so accept those too if they have an edit at least 3 months prior to the start of the nomination period
- AND who have had at least 500 edits in the article namespace any time before the nomination period
- AND who have at least 100 edits in the main namespace in the last 12 months leading to the nomination period
- AND are not bots
I tried my best to use alternative views, but that seems not to be sufficient. Is it possible to optimize this further?
use fawiki_p; select concat('[[User:', user_name, '|]]'), concat(left(log_timestamp, 4), '-', mid(log_timestamp, 5, 2), '-', mid(log_timestamp, 7, 2)) as acct_created, concat(left(first_timestamp, 4), '-', mid(first_timestamp, 5, 2), '-', mid(first_timestamp, 7, 2)) as first_edit, concat('{{formatnum:', fivehundred.cnt, '}}') as article_edits, concat('{{formatnum:', onehundred.cnt, '}}') as recent_edits, '' as voter_status from user join actor_logging on actor_user = user_id left join logging_userindex on log_actor = actor_id and log_type = 'newusers' and log_action in ('create', 'autocreate') -- creating main account and log_timestamp < '20200712000000' -- occured 3 or more months prior to nomination start date join ( select actor_user, min(rev_timestamp) as first_timestamp from revision_userindex join actor_revision on rev_actor = actor_id group by actor_user ) first_edit on user_id = first_edit.actor_user join ( select actor_user, count(*) cnt from revision_userindex join actor_revision on rev_actor = actor_id join page on page_id = rev_page where page_namespace = 0 and rev_timestamp < '20201012000000' -- edits prior to nomination start date group by actor_user having count(*) >= 500 ) fivehundred -- at least 500 edits in main namspace on user_id = fivehundred.actor_user join ( select actor_user, count(*) cnt from revision_userindex join actor_revision on rev_actor = actor_id join page on page_id = rev_page where page_namespace = 0 and rev_timestamp > '20191012000000' -- last 12 months leading to nomination start date and rev_timestamp < '20201012000000' group by actor_user having count(*) >= 100 ) onehundred -- at least 100 edits in the main namespace in the last twelve months leading to the nomination start date on user_id = onehundred.actor_user left join ( select ug_user from user_groups where ug_group = 'bot' ) bots on bots.ug_user = user_id where bots.ug_user is null and ( log_timestamp is not null or first_timestamp < '20200712000000' -- occured 3 or more months prior to nomination start date ) order by user_name