Every year we run a query on #quarry to retrieve the list eligible voters for that year's Supervisory Council elections (SC is fawiki's entity similar to enwiki's Arb Com). Given the recent changes in the databases, namely the addition of `actor` table, the query has gotten slower over time and now, it does not finish in the time limit Quarry imposes (which I think is 20 minutes).
While we can break this into multiple subqueries and combine them externally, we really prefer that it is a query that is publicly run on Quarry so that the fawiki community can transparently see its results.
The latest version of the query is below. Here is the logic in simple words:
* Given that the nomination period starts at 20201012000000
* Find all users whose account was created at least 3 months prior to the start of the nomination period
** Old accounts don't have an account creation log, so accept those too
* AND who have had at least 500 edits in the article namespace any time before the nomination period
* AND who have at least 100 edits in the main namespace in the last 12 months leading to the nomination period
* AND are not bots
I tried my best to use alternative views, but that seems not to be sufficient. Is it possible to optimize this further?
```lang=sql
use fawiki_p;
select
concat('[[User:', user_name, '|]]'),
concat(left(log_timestamp, 4), '-', mid(log_timestamp, 5, 2), '-', mid(log_timestamp, 7, 2)) as acct_created,
concat(left(first_timestamp, 4), '-', mid(first_timestamp, 5, 2), '-', mid(first_timestamp, 7, 2)) as first_edit,
concat('{{formatnum:', fivehundred.cnt, '}}') as article_edits,
concat('{{formatnum:', onehundred.cnt, '}}') as recent_edits,
'' as voter_status
from user
join actor_logging
on actor_user = user_id
left join logging_userindex
on log_actor = actor_id
and log_type = 'newusers'
and log_action in ('create', 'autocreate') -- creating main account
and log_timestamp < '20200712000000' -- occured 3 or more months prior to nomination start date
join
(
select
actor_user,
min(rev_timestamp) as first_timestamp
from revision_userindex
join actor_revision
on rev_actor = actor_id
group by actor_user
) first_edit
on user_id = first_edit.actor_user
join
(
select
actor_user,
count(*) cnt
from revision_userindex
join actor_revision
on rev_actor = actor_id
join page
on page_id = rev_page
where
page_namespace = 0
and rev_timestamp < '20201012000000' -- edits prior to nomination start date
group by actor_user
having count(*) >= 500
) fivehundred -- at least 500 edits in main namspace
on user_id = fivehundred.actor_user
join
(
select
actor_user,
count(*) cnt
from revision_userindex
join actor_revision
on rev_actor = actor_id
join page
on page_id = rev_page
where
page_namespace = 0
and rev_timestamp > '20191012000000' -- last 12 months leading to nomination start date
and rev_timestamp < '20201012000000'
group by actor_user
having count(*) >= 100
) onehundred -- at least 100 edits in the main namespace in the last twelve months leading to the nomination start date
on user_id = onehundred.actor_user
left join
(
select ug_user
from user_groups
where ug_group = 'bot'
) bots
on bots.ug_user = user_id
where
bots.ug_user is null
and
(
log_timestamp is not null
or
first_timestamp < '20200712000000' -- occured 3 or more months prior to nomination start date
)
order by user_name
```