Page MenuHomePhabricator

Improve CachingParser performance (2019)
Closed, ResolvedPublic

Description

This task is an umbrella for several performance improvements to the new parser.

TODO: Tweak caching: try to avoid stampedes, increase TTL; also, try to cache all the ASTs at the same time and maybe use WANCache.

Event Timeline

Change 540383 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Log deprecated vars in the cached phase in the new parser

https://gerrit.wikimedia.org/r/540383

Change 540145 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Replace array_map with foreach

https://gerrit.wikimedia.org/r/540145

Change 540385 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Remove AFPData::dup

https://gerrit.wikimedia.org/r/540385

Change 540145 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Replace array_map with foreach

https://gerrit.wikimedia.org/r/540145

Change 540385 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Remove AFPData::dup

https://gerrit.wikimedia.org/r/540385

XHGui is currently broken (T240066), but I think that the best we can do is cache all ASTs together. For a wiki as big as enwiki, this would mean one hit to memcached for every request, instead of 197. A possible implementation:

  • First of all, we should factor out a method like getAllFilters from AbuseFilterRunner::checkAllFilters
  • Then, we'd need a helper method to build a cache key from a list of local/global filters
  • In checkAllFilters, we first check if there's anything stored in that key.
    • If so, we retrieve the data (which would be a list of ASTs) and give the AST to the parser (as opposed to passing the bare content of the filter)
    • If the key is empty, we should set it at the end of checkAllFilters; we'd need a method to retrieve the AST from the parser, after the parsing is done.
  • Whenever a filter is modified, the key isn't valid anymore, so we have two options:
    • Invalidate the key altogether
    • Since, when saving a filter, we will also parse it to check syntax, we may selectively replace the AST for that filter and re-cache. Note, this would have to happen *after* the filter is saved, to avoid caching when the edit fails (maybe in a DeferredUpdate?)

For some of the steps above, we need new methods in the Parser class. The main problem is the presence of the old parser, which could force us to add some hacks.

Change 540383 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Log deprecated vars in the cached phase in the new parser

https://gerrit.wikimedia.org/r/540383

I will investigate whether it's feasible to cache all ASTs at the same time.

Krinkle awarded a token.
Krinkle moved this task from Watching to Perf recommendation on the Performance-Team (Radar) board.
Krinkle subscribed.

I've moved this to a new task for now since this wil otherwise stay open forever, I think! Thanks for all the work, this has been awesome.

T289923: Let AbuseFilter cache its parser AST result

Krinkle renamed this task from Improve CachingParser performance to Improve CachingParser performance (2019-2020).Aug 28 2021, 6:08 PM
Krinkle renamed this task from Improve CachingParser performance (2019-2020) to Improve CachingParser performance (2019).