Page MenuHomePhabricator

Addition of last hit date to Special:AbuseFilter table
Open, LowPublic

Description

Spammers and abusers come in cycles and certain filters have periods of quiescence. The only means to monitor filters is either to visually monitor (eyeball) the log, or to try and have a good memory of the count column and whether it has changed since you last checked. The addition of a "Last hit" per filter line would allow those of us who check filters daily/regularly to be able to see at a glance whether we should be digging further into the log of a filter.

Details

Related Gerrit Patches:
mediawiki/extensions/AbuseFilter : master[WIP] Add last hit column to Special:AbuseFilter
operations/mediawiki-config : mastermariadb: Depool db2055 for performance testing
mediawiki/extensions/AbuseFilter : masterAdd "Last hit" timestamp column to Special:AbuseFilter list view

Event Timeline

Billinghurst raised the priority of this task from to Needs Triage.
Billinghurst updated the task description. (Show Details)
Billinghurst added a project: AbuseFilter.
Billinghurst added a subscriber: Billinghurst.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 23 2015, 9:19 AM
Aklapper triaged this task as Low priority.Mar 26 2015, 2:23 PM
Glaisher claimed this task.Mar 29 2015, 5:23 PM
Glaisher added a subscriber: Glaisher.

Change 200481 had a related patch set uploaded (by Glaisher):
Add "last hit timestamp" column to Special:AbuseFilter list view

https://gerrit.wikimedia.org/r/200481

Change 200481 merged by jenkins-bot:
Add "Last hit" timestamp column to Special:AbuseFilter list view

https://gerrit.wikimedia.org/r/200481

Restricted Application added a subscriber: Luke081515. · View Herald TranscriptJul 17 2015, 10:45 AM
Glaisher closed this task as Resolved.Jul 17 2015, 11:21 AM
Glaisher added a subscriber: Addshore.

Thanks @Addshore!

Yeah, it's very useful and makes the management easier. Could it be made sortable as other columns are?

matej_suchanek set Security to None.
matej_suchanek removed a subscriber: gerritbot.
Legoktm reopened this task as Open.Jul 24 2015, 10:21 AM
Legoktm added a subscriber: Legoktm.

Reverted due to the query timing out on various large projects. T106798#1478602 shows the query plan :/

@jcrespo Can you perhaps help with a better implementation for the query when you have time? Noting that this is low priority so this is not necessary but would be nice. :)

Restricted Application added a subscriber: JEumerus. · View Herald TranscriptFeb 7 2016, 2:51 PM

On my wiki, running the previously used query gives the following:

EXPLAIN SELECT af_id,af_enabled,af_deleted,af_global,af_public_comments,af_hidden,af_hit_count,af_timestamp,af_user_text,af_user,af_actions,af_group,MAX(afl_timestamp) AS afl_timestamp FROM `abuse_filter` LEFT JOIN `abuse_filter_log` ON ((af_id=afl_filter)) WHERE af_deleted = 0 GROUP BY af_id ORDER BY af_id LIMIT 51
idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra
1SIMPLEabuse_filterALLNULLNULLNULLNULL146Using where; Using temporary; Using filesort
1SIMPLEabuse_filter_logindexfilter_timestampfilter_timestamp80NULL4144Using where; Using index; Using join buffer (flat, BNL join)

And this sucks.

jcrespo moved this task from Triage to Backlog (help welcome) on the DBA board.

Is it possible to look at something other than a binary solution (off/on). For those of us who do the unheralded work in the background — spambots, vandals, LTA — having something to make our life easier would be most helpful; most especially at meta for the global filters where the spambot hits drown out others. Is there an intermediate measure other than at a point of call generation. Is it possible that this could be updated hourly, of six hourly?

The result could be cached if needs be, probably, but the point is that this query is terrible. Maybe we could optimize it with a subquery, but that would be sort of hacky. Otherwise, I guess we'd need to add indexes, i.e. perform a schema change, which isn't quick to do.

@Daimona <shrug> I am just nuff nuff spam handler and typical end user who needs some abilities to better handle the abusefilter/log which has become a morass. I talk about desires in tools that allow me to work efficiently; and to better and more effectively protect our wikis. I generally put up with the inefficiencies in the system, however, when I get to the point of "f*** it" and am throwing my hands up (again) that this is wearing me down, then I will speak up.

I know that I work in an unsexy, somewhat invisible space, and we need to balance where we allocate resources. If is not considered important or of no, or little, value, then people just need to say so, I am sure that I can find something else to do.

I am an end user myself who fiddles with abuse filters every single day. Thus, I'm already aware of several parts of this extension that should be improved. Many of them are being worked on, but it's not a quick process, and sometimes it's neither easy.

In this specific case, I personally fail to see the last hit date as an indispensable feature. Though, I'd still add id, if it weren't for the major performance issue outlined above. That's not easy to work around, and would instead require a big change which, IMHO, shouldn't be prioritized, given that we altedy have several important changes on hold.

If for you it's absolutely essential to see the last hit for each filter, and as you said above it would be fine to update it every hour or so, you can just manually run the query on Quarry.

So I looked at this again. All improvements that I could think of wouldn't really increase the performance of the query. The only concrete boost would be to turn JOIN + GROUP BY into a subquery, i.e.

SELECT af_id, af_enabled, af_deleted, af_global, af_public_comments, af_hidden, af_hit_count, af_timestamp, af_user_text, af_user, af_actions, af_group, (SELECT MAX(afl_timestamp) FROM `abuse_filter_log` WHERE ( ( af_id = afl_filter ) ) LIMIT 1) AS afl_timestamp FROM `abuse_filter` WHERE af_deleted = 0 ORDER BY af_id LIMIT 51

This query performs greatly, but isn't pretty to look at due to the LIMIT 1, and we'd need to specify an additional ORDER BY afl_id DESC to really ensure we get the last hit. This wouldn't impact performance, as it's very unlikely that we'll ever get duplicates on MAX(afl_timestamp) with af_id = afl_filter.

The only point is: how would it be hacky to use a subquery in getQueryInfo? I couldn't really find similar examples, and generally speaking I'd be against it. However, in this specific case it could be acceptable. I'm submitting a patch shortly, then we can go on discussing with some code at hand.

I'm almost there. A couple of notes:

  • As opposed to what I said above, the ORDER BY in the subquery is not necessary: the last hit column only shows the timestamp, and if there are duplicates we can of course just pick one. I initially thought that we were going to add a link to the log entry.
  • The last hit field will not be sortable. While it would be easy and quick to do a simple ORDER BY afl_timestamp, but the problem comes when an offset is specified. That causes a WHERE clause like afl_timestamp > 123. But afl_timestamp is the result of the subquery and cannot be used in a WHERE clause. To make it work we should add abuse_filter_log to the list of tables, but that would make performance issues come back. I'm still leaving a ToDo in the code, but it won't be easy to accomplish with the current schema.

Unrelated, throughout the AbuseFilter code we perform several queries on the abuse_filter table with af_deleted = 0 in the WHERE clause. Notably, this happens on the default view of Special:AbuseFilter and every time filters to execute for the current action are picked from the DB. Although the abuse_filter table is always pretty small (no more than 1000 rows on enwiki) and the query is already fast, maybe we could speed it up even more by adding an index on af_deleted.


New query performance from beta cluster:

MariaDB [deploymentwiki]> EXPLAIN SELECT af_id,af_enabled,af_deleted,af_pattern,af_global,af_public_comments,af_hidden,af_hit_count,af_timestamp,af_user_text,af_user,af_actions,af_group,af_throttled,(SELECT MAX(afl_timestamp) FROM `abuse_filter_log` WHERE (af_id = afl_filter) LIMIT 1 ) AS `afl_timestamp` FROM `abuse_filter` WHERE af_deleted = '0' AND af_enabled = '1' AND af_global = '0' ORDER BY af_id LIMIT 51\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: abuse_filter
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 8
          ref: NULL
         rows: 30
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: abuse_filter_log
         type: index
possible_keys: filter_timestamp
          key: filter_timestamp
      key_len: 80
          ref: NULL
         rows: 343
        Extra: Using where; Using index
2 rows in set (0.01 sec)

Change 487010 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Add last hit column to Special:AbuseFilter

https://gerrit.wikimedia.org/r/487010

Daimona claimed this task.Jan 29 2019, 10:05 AM

@jcrespo Could you please check the query from my last comment and see how it goes?

jcrespo moved this task from Backlog (help welcome) to Next on the DBA board.Jan 29 2019, 10:07 AM

Not looking good on production (dependent subquery with a full table scan):

root@db1106.eqiad.wmnet[enwiki]> EXPLAIN SELECT af_id,af_enabled,af_deleted,af_pattern,af_global,af_public_comments,af_hidden,af_hit_count,af_timestamp,af_user_text,af_user,af_actions,af_group,af_throttled,(SELECT MAX(afl_timestamp) FROM `abuse_filter_log` WHERE (af_id = afl_filter) LIMIT 1 ) AS `afl_timestamp` FROM `abuse_filter` WHERE af_deleted = '0' AND af_enabled = '1' AND af_global = '0' ORDER BY af_id LIMIT 51\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: abuse_filter
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 8
          ref: NULL
         rows: 51
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: abuse_filter_log
         type: index
possible_keys: afl_filter
          key: afl_filter
      key_len: 82
          ref: NULL
         rows: 22761859
        Extra: Using where; Using index
2 rows in set (0.00 sec)

If stats were accurate, that would be a 1000 million row scan, which would be a big no (would not finish within 60 seconds and get killed).

I finally have some time to look at this, but I am confused about the initial goal, and the summary is helpful but unclear on the details- @Daimona will you have some time on IRC (freenode, DBA) to chat and better understand it. Sorry, maybe you told me already, but I may have forgotten.

jcrespo moved this task from Next to In progress on the DBA board.Feb 5 2019, 5:19 PM

So, first of all, abuse_filter_log is a varbinary(64) and not a bigint, and that will cause all sort of issues, like potentially refusing to use the right index on a JOIN. That should be fixed. On paper it is defined as varchar(64) binary NOT NULL which is the same as varbinary. This is because a string is being used instead of a number (32 or 64 bit int).

I will check if the above issue can be fixed without doing that first, but the above should be a high priority.

What you are trying to do is a join combined with a group wise maximum.

The correlated subquery version works, but it is too slow:

root@db1106.eqiad.wmnet[enwiki]> SELECT * FROM abuse_filter_log afl1 WHERE afl_id = (SELECT MAX(afl_id) FROM abuse_filter_log afl2 WHERE afl1.afl_filter = afl2.afl_filter ) ORDER BY afl_filter;
+----------+------------+
| afl_id   | afl_filter | 
+----------+------------+
| 23140805 | 1          | 
|  2869380 | 10         | 
|   588458 | 101        |
...
| 23180767 | 11         |
...
822 rows in set (50.12 sec)

Also notice it is weirdly ordered in lexicographical order.

So, first of all, abuse_filter_log is a varbinary(64) and not a bigint, and that will cause all sort of issues, like potentially refusing to use the right index on a JOIN. That should be fixed. On paper it is defined as varchar(64) binary NOT NULL which is the same as varbinary. This is because a string is being used instead of a number (32 or 64 bit int).

I guess you meant abuse_filter_log.afl_filter, and yes, that is a known issue tracked in T42757. This patch is a quick fix to unbreak Postgres compatibility, while this one is the long-term solution. The field is a varbinary only because its possible values may be "$id" or "global-$id", with $id being an integer, and the plan is to add another column, abuse_filter_log.afl_global so that we can only store the ID in afl_filter. The patch is WIP but I guess I should start working on it again.
Aside from the obvious benefit of having 2 separate columns, I think that it would also boost performance (a little) by avoiding implicit CASTs for every row.

So IIUC we should first address T42757 before moving on, right?

jcrespo added a comment.EditedFeb 6 2019, 12:21 PM

So IIUC we should first address T42757 before moving on, right?

Please let me complete the analysis before giving a final recommendation, I was drawn into a meeting before I could finish it.

it would also boost performance (a little) by avoiding implicit CASTs for every row.

That is normally not a huge issue, it is the problems of joining columns of different type, which can and many times confuse the query planner because the comparison function used is not optimal (the casting itself is not such an issue, but the refusal of using an index because the optimized doesn't understand that integer and string comparison functions would give the same results). Your reason is true, but a secondary concern.

@jcrespo Ah, sure. In the meanwhile, I'll revamp the WIP patch, which will be needed either way at some point.

The analysis will extend a bit more as I need to alter the 10G table on a spare host to check if a type change and a new index would work.

Indeed the alter will be essential for what I see:

root@db1106.eqiad.wmnet[enwiki]> EXPLAIN select max(afl_id) FROM abuse_filter_log WHERE afl_filter = '3'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: abuse_filter_log
         type: ref
possible_keys: afl_filter
          key: afl_filter
      key_len: 66
          ref: const
         rows: 1185562
        Extra: Using where; Using index
1 row in set (0.01 sec)

root@db1106.eqiad.wmnet[enwiki]> select max(afl_id) FROM abuse_filter_log WHERE afl_filter = 3\G                 
*************************** 1. row ***************************
max(afl_id): <id>
1 row in set (5.48 sec)

root@db1106.eqiad.wmnet[enwiki]> EXPLAIN select max(afl_id) FROM abuse_filter_log WHERE afl_filter = 3\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: abuse_filter_log
         type: index
possible_keys: afl_filter
          key: afl_filter
      key_len: 82
          ref: NULL
         rows: 22769370
        Extra: Using where; Using index
1 row in set (0.00 sec)


root@db1106.eqiad.wmnet[enwiki]> select max(afl_id) FROM abuse_filter_log WHERE afl_filter = '3'\G
*************************** 1. row ***************************
max(afl_id): <id>
1 row in set (0.28 sec)

That 20x difference could be the one between being able to query or not (as this action will be done 300 times). However, it is still very slow, so we may have to create an (afl_filter, afl_id) compound index.

Change 488410 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db2055 for performance testing

https://gerrit.wikimedia.org/r/488410

Change 488410 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db2055 for performance testing

https://gerrit.wikimedia.org/r/488410

Mentioned in SAL (#wikimedia-operations) [2019-02-06T15:08:56Z] <jynus@deploy1001> Synchronized wmf-config/db-codfw.php: Depool db2055 for performance testing T93564 (duration: 00m 55s)

Mentioned in SAL (#wikimedia-operations) [2019-02-06T16:19:18Z] <jynus> running alter table on db2055 T93564

This comment was removed by jcrespo.
I dont consider the ids private, but just in case
MariaDB [enwiki]> select max(afl_id) FROM abuse_filter_log WHERE afl_filter = '3';
+-------------+   
| max(afl_id) |   
+-------------+   
|    <id> |   
+-------------+   
1 row in set (2.67 sec)

MariaDB [enwiki]> select max(afl_id) FROM abuse_filter_log WHERE afl_filter = 3;
[many seconds later]
^CCtrl-C -- query killed. Continuing normally.

MariaDB [enwiki]> set sql_log_bin=0; alter table abuse_filter_log add index (`afl_filter`, `afl_id`);
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (2 min 15.96 sec)
Records: 0  Duplicates: 0  Warnings: 1

MariaDB [enwiki]> select max(afl_id) FROM abuse_filter_log WHERE afl_filter = '3';
+-------------+
| max(afl_id) |
+-------------+
|    <id> |
+-------------+
1 row in set (0.00 sec)

With the index, it even works without the quotes.

MariaDB [enwiki]> select max(afl_id) FROM abuse_filter_log WHERE afl_filter = 3;
+-------------+
| max(afl_id) |
+-------------+
|    <id> |
+-------------+
1 row in set (0.00 sec)

That looks better, we now only have to construct the right query that will return all results, even if it has to run the previous one 300 times.

I think, with the alter, this should be the strategy:

EXPLAIN SELECT * FROM abuse_filter_log JOIN abuse_filter ON af_id = afl_filter WHERE afl_id = (SELECT max(afl_id) FROM abuse_filter_log WHERE afl_filter = 1)
union all
SELECT * FROM abuse_filter_log JOIN abuse_filter ON af_id = afl_filter WHERE afl_id = (SELECT max(afl_id) FROM abuse_filter_log WHERE afl_filter = 2)
union all
SELECT * FROM abuse_filter_log JOIN abuse_filter ON af_id = afl_filter WHERE afl_id = (SELECT max(afl_id) FROM abuse_filter_log WHERE afl_filter = 3);
----------------------------+
|    1 | PRIMARY     | abuse_filter_log | const | PRIMARY,afl_filter,afl_filter_2 | PRIMARY | 8       | const |    1 |                              |
|    1 | PRIMARY     | abuse_filter     | const | PRIMARY                         | PRIMARY | 8       | const |    1 |                              |
|    2 | SUBQUERY    | NULL             | NULL  | NULL                            | NULL    | NULL    | NULL  | NULL | Select tables optimized away |
|    3 | UNION       | abuse_filter_log | const | PRIMARY,afl_filter,afl_filter_2 | PRIMARY | 8       | const |    1 |                              |
|    3 | UNION       | abuse_filter     | const | PRIMARY                         | PRIMARY | 8       | const |    1 |                              |
|    4 | SUBQUERY    | NULL             | NULL  | NULL                            | NULL    | NULL    | NULL  | NULL | Select tables optimized away |
|    5 | UNION       | abuse_filter_log | const | PRIMARY,afl_filter,afl_filter_2 | PRIMARY | 8       | const |    1 |                              |
|    5 | UNION       | abuse_filter     | const | PRIMARY                         | PRIMARY | 8       | const |    1 |                              |
|    6 | SUBQUERY    | NULL             | NULL  | NULL                            | NULL    | NULL    | NULL  | NULL | Select tables optimized away |
+------+-------------+------------------+-------+---------------------------------+---------+---------+-------+------+------------------------------+

Let me find the non-ugly version of this.

jcrespo moved this task from In progress to Done on the DBA board.Feb 6 2019, 5:56 PM

So this is my final recommendation:

  1. Prepare and deploy the alter table:
add index (`afl_filter`, `afl_id`)
  1. Change the query to a variant of (it depends on the fields needed, left join can be used instead of join, etc.):
SELECT af.*, afl.* -- change the * for the right fields, explicitly (note selecting all fields can be a lot of bytes)
FROM abuse_filter af
JOIN abuse_filter_log afl ON af.af_id = afl.afl_filter
JOIN (SELECT afl_filter, MAX(afl_id) AS afl_id FROM abuse_filter_log GROUP BY afl_filter) as aflg ON afl.afl_id = aflg.afl_id
ORDER BY af.af_id; -- order by the field you need

Execution time:

823 rows in set (0.04 sec) -- time on a non-ssd host

Explain:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 88365
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: afl
         type: eq_ref
possible_keys: PRIMARY,afl_filter,afl_filter_2
          key: PRIMARY
      key_len: 8
          ref: aflg.afl_id
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 1
  select_type: PRIMARY
        table: af
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: enwiki.afl.afl_filter
         rows: 1
        Extra: Using where
*************************** 4. row ***************************
           id: 2
  select_type: DERIVED
        table: abuse_filter_log
         type: range
possible_keys: NULL
          key: afl_filter_2
      key_len: 66
          ref: NULL
         rows: 88365
        Extra: Using index for group-by

Rows read:

+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 1     |
| Handler_read_key           | 3292  |
| Handler_read_last          | 1     |
| Handler_read_next          | 0     |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 823   |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 1648  |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 1646  |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
26 rows in set (0.06 sec)

(most of the temporary table is for the last order by)

The string to int is not a blocker, but I would recommend doing it afterwards too, as it can have negative effects as shown above.

With this I consider the optimization help "fully done", and you can contact us back when you are ready for the schema change (it should take only 2-3 minutes per host).

@jcrespo Thanks for the deep and useful analysis! I'll submit the patch for the schema change later or tomorrow. Shall I open a new ticket for that?

The schema change (code and its deployment) is up to you and how you organize yourself, for the "deployment of a schema change on WMF servers" we do require indeed a separate ticket with (Blocked-on-schema-change ) for organization purposes. Details on the workflow at https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change

jcrespo added a comment.EditedFeb 6 2019, 6:08 PM

Oh, I just realized that I did the "last edit" based on the id, not the timestamp. I hope that works, if not the idea would be very similar (and maybe you wouldn't need the new index). I just though that the id would be a more accurate representation of the "last hit".

E.g.: this may work without the new index, as I use timestamp instead of id:

EXPLAIN SELECT af.*, afl.*  FROM abuse_filter af JOIN abuse_filter_log afl ON af.af_id = afl.afl_filter JOIN (SELECT afl_filter, MAX(afl_timestamp) AS afl_timestamp FROM abuse_filter_log GROUP BY afl_filter) as aflg ON afl.afl_timestamp = aflg.afl_timestamp AND afl.afl_filter = aflg.afl_filter ORDER BY af.af_id\G
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 1     |
| Handler_read_key           | 3292  |
| Handler_read_last          | 1     |
| Handler_read_next          | 1279  |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 834   |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 1659  |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 1657  |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
26 rows in set (0.05 sec)

Yes, I meant a new task with Blocked-on-schema-change, thanks. As for the ID/timestamp, the ID is of course better to use. I think the only reason the query used afl_timestamp is the filter_timestamp index.

Daimona moved this task from Backlog to Next on the User-Daimona board.

^see my comment update (the table already has some index overhead).

I tried to implement the query above in IndexPager, but I cannot get it to work... I pushed a new version of the change as POC, but IndexPager doesn't do what we expect. Instead, it executes the following query:

SELECT af_id,af_enabled,af_deleted,af_pattern,af_global,af_public_comments,af_hidden,af_hit_count,af_timestamp,af_user_text,af_user,af_actions,af_group,af_throttled FROM `abuse_filter` WHERE af_deleted = '0' AND af_global = '0' ORDER BY af_id LIMIT 51

where join_conds are completely ignored. I'm unsure whether this is some IndexPager limitation, or I just did it wrong (or both).

yeah, whether mediawiki sql abstraction supports it or not is a different beast, you should ask a mediawiki export (that's not me :-)) if JOINining a subquery is supported or has a separate syntax, or can be overriden in some way.

Daimona moved this task from Next to Future on the User-Daimona board.Mar 4 2019, 3:01 PM
Daimona removed Daimona as the assignee of this task.Jul 23 2019, 10:18 AM

Unlicking the cookie, I'm not currently on this.