Rethink slow filters and profiling after enabling CachingParser
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	Daimona
	Aug 23 2019, 7:13 PM

Description

The new CachingParser will have a shorter runtime, as previewed in T156095#5434572. This means that the current slow-filters system will mostly stop working, because the current threshold is too high. The problem is, cold-cache executions will still have runtimes similar to the ones we get now. So an idea could be to have two different thresholds, one for cold cache and one for warm cache. But that's not easy: runtime recording is handled inside AbuseFilterRunner, and is not limited to parsing, so the Parser doesn't have any knowledge about it.
Of note, for what concerns the runtime of all filters together, some of them may find warm cache, and others may not. Thus, the grouped runtime may be a little useful.

A possible implementation would be a (singleton?) profiler class, like I said in the comments of https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/498978/, and the thresholds could be determined based on runtime in production. But for aggregated stats, there's not much I can think of.

Details

	Subject	Repo	Branch	Lines +/-
	Group slow filters by warm/cold cache	mediawiki/extensions/AbuseFilter	master	+59 -15

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Open		None	T231112 Rethink slow filters and profiling after enabling CachingParser
Resolved		Krinkle	T156095 Re-enable AbuseFilterCachingParser once we are sure it's safe
Resolved		Daimona	T153251 AbuseFilterCachingParser fails to parse filters with trailing comma in function calls (works in old parser)
Resolved		Daimona	T152281 Support if statements with several statements in AbuseFilterCachingParser
Resolved		Daimona	T156096 Deprecate and then reject empty operators
Duplicate	BUG REPORT	None	T224746 [Beta Commons] All attempts to upload with UploadWizard fail
Resolved		Daimona	T218906 Short-circuit evaluation fails with multiple comparisons
Resolved		Daimona	T214642 Short circuit breaks with multiple operations
Resolved		Daimona	T230727 Allow ifs without else and reject empty conditions (because CachingParser doesn't support empty if)
Resolved		Daimona	T230982 Rewrite the code for hoisting variables in skipped branches for CachingParser
Resolved		Daimona	T232498 The CachingParser doesn't report syntax errors in skipped branches

Event Timeline

Daimona created this task.Aug 23 2019, 7:13 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 23 2019, 7:13 PM

Daimona added a subtask: T156095: Re-enable AbuseFilterCachingParser once we are sure it's safe.Aug 23 2019, 7:13 PM

Galobtter subscribed.Aug 28 2019, 5:51 PM

I think it'd be great if we could make it very rare/unlikely to have a cache miss, and then apply the per-filter measures only when the rule-parsing was cached. We could then have a separate metric (in Graphite/Logstash, for developers) to closely monitor when cache entries are missing.

There would still be the overall measure for all filters combined (not skipping anything) to make sure we don't regress in terms of impact on Save Timing.

Would it be feasible to make cache misses nearly impossible? Some of the below may already be the case, but here are some ideas:

We can avoid a guaranteed cache miss after a filter modification, by parsing and caching the result automatically when modifying/creating a an abuse filter. There are several features in WANObjectCache (key touching, check keys, staleTTL, lockTSE, async pre-emptive regeneration) that can help us minimise chances of cache misses.
If the cache might fall out for a filter if it is only rarely reached (e.g. other rules always match first), we could do a batch query on the cache (getMulti) to always get them together and thus keep them popular. Alternatively, if we want to optimise for shorter individual steps and quicker early return, we could keep things warm by having a deferred update that checks the unused filters and ensures they are re-parsed/cached if they fell out.

With some of the above applied, and a long expiry (month or year), and assuming it won't expire/fall out of Memcached for other reasons - in theory - what scenarios could still result in a cache miss?

Krinkle added a project: Performance-Team (Radar).Aug 28 2019, 9:46 PM

Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.

In T231112#5447307, @Krinkle wrote:

I think it'd be great if we could make it very rare/unlikely to have a cache miss, and then apply the per-filter measures only when the rule-parsing was cached. We could then have a separate metric (in Graphite/Logstash, for developers) to closely monitor when cache entries are missing.

Collecting data for warm runs only makes sense. As I said in the description, profiling data is saved in AbuseFilterRunner, while only the Parser has knowledge about cache hits/miss. So we'd have to write some additional logic, but it shouldn't be too complicated. And not showing cold runs data in the UI is fine, but only as long as we really manage to greatly reduce the cache misses.

There would still be the overall measure for all filters combined (not skipping anything) to make sure we don't regress in terms of impact on Save Timing.

For sure. And after all, I don't think we should care about that runtime being a mix of cold and warm runs. Currently, it's already a mix of filters bailing out at the first condition and filters where every condition is checked, so it should only represent the average extra save time.

Would it be feasible to make cache misses nearly impossible?

We can definitely try :-)

We can avoid a guaranteed cache miss after a filter modification, by parsing and caching the result automatically when modifying/creating a an abuse filter.

I think this is already in place (inadvertently): when saving a filter, we run a syntax check, which in turns involves parsing the filter and will cache the tokens and the AST with the new parser. And that will also keep working if we change the syntax check so that it only tries to build the AST, without evaluating it (T231536).

There are several features in WANObjectCache (key touching, check keys, staleTTL, lockTSE, async pre-emptive regeneration) that can help us minimise chances of cache misses.

We could also use key touching as we do for global filters, yes. [1]

If the cache might fall out for a filter if it is only rarely reached (e.g. other rules always match first), we could do a batch query on the cache (getMulti) to always get them together and thus keep them popular. Alternatively, if we want to optimise for shorter individual steps and quicker early return, we could keep things warm by having a deferred update that checks the unused filters and ensures they are re-parsed/cached if they fell out.

Every filter is always executed (unless they're consuming too many conditions, but this shouldn't happen on WMF wikis), so every edit will recache all missing filters.

With some of the above applied, and a long expiry (month or year), and assuming it won't expire/fall out of Memcached for other reasons - in theory - what scenarios could still result in a cache miss?

No other scenarios. Currently, the TTL is one day in the local server. We could increase it and use key touching as you proposed above.

[1] - For global filters, we cache the results of the query on the foreign DB. That has a long TTL and the key is touched when changing a global filter. The mechanism here would be a little different, but we could also be able to cache the query for local filters and delegate all the caching to a separate method.

Change 537499 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Group slow filters by warm/cold cache

https://gerrit.wikimedia.org/r/537499

gerritbot added a project: Patch-For-Review.Sep 17 2019, 5:43 PM

The implementation above should be good. Ensuring as few cache miss as possible could be the scope of another task, to be opened after looking at prod results.

Krinkle closed subtask T156095: Re-enable AbuseFilterCachingParser once we are sure it's safe as Resolved.Oct 8 2019, 12:02 AM

Daimona added a project: User-Daimona.Aug 29 2020, 12:30 PM

Daimona moved this task from Backlog to Ideas for overhaul on the User-Daimona board.

Daimona removed a project: User-Daimona.Sep 16 2020, 2:25 PM

Aklapper edited projects, added Patch-Needs-Improvement; removed Patch-For-Review.Feb 3 2022, 4:16 PM

Daimona removed Daimona as the assignee of this task.Feb 7 2022, 12:29 PM

Krinkle removed a project: Performance-Team (Radar).Aug 18 2023, 8:13 PM