rEABFd19ced4cef8d: Filter AbuseLog by the "impact" of the change didn't assume it was unused, so the filter is quite useless.
- Mentioned In
- T226851: Drop abuse_filter_log.afl_log_id in production
- Mentioned Here
- rEABF1ee749a96671: AbuseFilter: Add .sql patch for the changes in r68584. Not used by anything in…
rEABF5e4289ce4e24: AbuseFilter: Resolve bugs 18374, 28633. * Store the revision ID associated with…
rEABFd19ced4cef8d: Filter AbuseLog by the "impact" of the change
T204126: AbuseLog impact search should take page deletion into account
Cool! Introduced with rEABF5e4289ce4e240f5a91e7c6fd68bab32b8fe7074c, I see, but never used. Its counterpart, afl_rev_id, is used to show the diff link to successfully saved revisions. I guess afl_log_id was meant to be a foreign key to logging.log_id, but I cannot get why. Was it meant to show "a diff to a log entry"? Lol. Some copypasta from core tables which hold bot *_rev_id and *_log_id?
Was it meant to show "a diff to a log entry"?
This is how I imagined it. But this remark in the linked commit is interesting:
Store the revision ID associated with a log entry if the action is successful
True. That comment leads in fact to think that it was meant to show a sort of "log diff". Probably that was the idea, then Andrew forgot to use the column, or changed his mind but forgot to remove it. Another possibility is that the column is there to make the table more similar to other core tables. See for instance rEABF1ee749a966714a1c82d45520e7a025c847a3ed86: afl_patrolled_by is unused as well (although in this case we're sure that it was meant to be unused).
Entering this discussion abruptly, so I'm not sure how useful my comment is, but: please note that unlike edits (which are all in one place), logs are stored in different places, and it is wrong to assume that this field is meant to point to a logging.log_id as, to the extent I understand it, logs that are in other tables can also trigger filters. At least in theory. Unless I am totally misunderstanding the code.
Alright, so I think this afl_log_id field should be removed altogether. Not only because we don't use it, but it makes queries slower. For instance, I see the following logged as slow on tendril for commonswiki:
SELECT * FROM `abuse_filter_log` LEFT JOIN `abuse_filter` ON ((af_id=afl_filter)) WHERE (NOT ( afl_rev_id IS NULL AND afl_log_id IS NULL )) AND afl_deleted = '0' ORDER BY afl_timestamp DESC LIMIT 51
Running it with an explain on quarry says
id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE abuse_filter_log index_merge afl_rev_id,afl_log_id afl_rev_id,afl_log_id 5,5 363569 Using sort_union(afl_rev_id,afl_log_id); Using where; Using filesort 1 SIMPLE abuse_filter eq_ref PRIMARY PRIMARY 8 commonswiki.abuse_filter_log.afl_filter 1 Using where
So it's scanning 363569 rows + a filesort. Removing the afl_log_id part, which is unused anyway, is much faster. I don't have an explain from quarry because the query is too fast, but this one is from my local wiki (where the query above scans 3055 rows):
1 SIMPLE abuse_filter_log index afl_rev_id afl_timestamp 14 NULL 91 Using where 1 SIMPLE abuse_filter eq_ref PRIMARY PRIMARY 8 my_wiki.abuse_filter_log.afl_filter 1 Using where
And this is instantaneous.
The reason which makes it happen is also ovious: afl_log_id is not indexed. So, given that it's been unused for years, we can safely go ahead and free some DB space. We can always re-add the field in the future if we find the need for it, and possibly with the right indexes.