Page MenuHomePhabricator

Re-enable AbuseFilterCachingParser once we are sure it's safe
Closed, ResolvedPublic

Description

Tracking task for known incompatibilities between the new and old rule parser, before we can re-enabling the new parser.

Previous issues of site performance, citing AbuseFilter as contributing factor or root cause:

As the parser will be working, unit tests for it should be enabled (see https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/453996/)


Plan for finishing AbuseFilterCachingParser
  1. Fix "Short-circuit evaluation with multiple comparisons". – T218906, https://gerrit.wikimedia.org/r/498127
  2. Fix "Short circuit with assignments . –" T214674 (only if the fix is easy; to be discussed...)
  3. Fix and re-enable unit tests for CachingParser. – https://gerrit.wikimedia.org/r/454557
  4. Deprecate "wild" abusefilter syntax. – T156096
  5. Investigate and fix AbuseFilterCachingParser bug with "if" conditions. – T152281
  6. Improve parameters handling - https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/531140/
  7. Allow ifs without else and reject empty conditions (because CachingParser doesn't support empty if). – T230727
  8. Speed up hoisting for the CachingParser – T230982
  9. Report errors inside short-circuited blocks when checking syntax. – T232498
  10. Change the CachingParser to throw an AFPUserVisibleException instead of a TypeError for empty operands. – T156096, T153251
  11. Fix any on-wiki filters that use "wild" abusefilter syntax, including trailing commas in function calls (T153251) – T156096
  12. Right before enabling the new parser, re-check logstash and ensure no empty operands are logged – T156096, T153251
  13. Enable CachingParser in production according to the plan below
  14. Use CachingParser as default in extension.json - Not before completing the switch in production, ensuring that we have no regressions, and completing the deprecations above; tentatively proposing MW 1.35.
Plan for deploying AbuseFilterCachingParser

(See T156095#5434572 for a preview)

  1. Establish base line metric for amount of time spent in AbuseFilter during saving of edits. For example, the time spent in AbuseFilter within Backend Save Timing (mean avg, median, p75; for three different days in a given week), as well as the total amount of time (median, p75, for three different days).
  2. Enable on Beta Cluster and group0 wikis and monitor impact on the above metrics as well as the general Save Timing and Backend Save Timing dashboards in Grafana.
  3. If no performance degradation (T156095#5425912), enable on group1.
  4. If no performance degradation, enable on group2.
Nice-to-haves
  1. Fix AbuseFilterCachingParser bug causing stack overflow. – T148660 (not a blocker, per T148660#5411970)
Other perf work on AbuseFilter
  1. Reduce the frequency at which we store send data to the cache.
  2. T53294: Overhaul internal profiling system
  3. T219092: Runtime profiling shouldn't include time taken to parse wikitext

Details

Related Gerrit Patches:
mediawiki/extensions/AbuseFilter : masterAlign arg counting between the parsers
operations/mediawiki-config : masterMove the remaining wikis to AbuseFilterCachingParser
operations/mediawiki-config : masterEnable AbuseFilterCachingParser for (almost) all wikis
operations/mediawiki-config : masterUse AbuseFilterCachingParser on wikidatawiki
operations/mediawiki-config : masterEnable AbuseFilterCachingParser for hewiki and commonswiki (group1)
operations/mediawiki-config : masterEnable AbuseFilterCachingParser on itwiki and nlwiki
operations/mediawiki-config : masterUse AbuseFilterCachingParser for group0 and deploymentwiki
mediawiki/extensions/AbuseFilter : wmf/1.34.0-wmf.23Actually provide a StatsdDataFactory to the parser
mediawiki/extensions/AbuseFilter : masterActually provide a StatsdDataFactory to the parser
mediawiki/extensions/AbuseFilter : masterAdd profiling points throughout the code for the CachingParser switch
mediawiki/extensions/AbuseFilter : wmf/1.34.0-wmf.23Add profiling points throughout the code for the CachingParser switch
mediawiki/extensions/AbuseFilter : masterBetter handling of function params in CachingParser

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Krinkle nice! I was looking at xhgui and noticed some performance issues with the current parser. I'd like to do some debugging with the new parser on a mwdebug server, if that's fine for you. If so, could you please reach out to me on IRC when you're available? Thanks.

This was done now, below are the results. For all of the tests, I was logged in with my account, and made a test edit in my sandbox, on both itwiki and enwiki. I repeated all edits twice, so to see the effect when the cache is cold/warm. Note that for some edits there are multiple entries, because every time the summary field loses focus we re-run filters in 'stash' mode.

Results (averaged)

AbuseFilterParser

itwiki
MethodTime (inclusive, in ms)Memory (inclusive, in bytes)cache
AbuseFilterTokenizer::getTokens1422,527,864cold
AbuseFilterParser::doLevelEntry259392,096cold
AbuseFilterParser::doLevelEntry1552,210,160warm
enwiki
MethodTime (inclusive, in ms)Memory (inclusive, in bytes)cache
AbuseFilterTokenizer::getTokens2364,300,840cold
AbuseFilterParser::doLevelEntry1131514,544cold
AbuseFilterParser::doLevelEntry2422,675,000warm

AbuseFilterCachingParser

Note: Where marked with *, AbuseFilterCachingParser::intEval includes AbuseFilterTokenizer and AFPTreeParser.

itwiki
MethodTime (inclusive, in ms)Memory (inclusive, in bytes)cache
AbuseFilterTokenizer::getTokens1003,527,872cold
AFPTreeParser::doLevelEntry440787,600cold
AbuseFilterCachingParser::intEval *9003,331,216cold
AbuseFilterCachingParser::intEval117116,000warm
enwiki
MethodTime (inclusive, in ms)Memory (inclusive, in bytes)cache
AbuseFilterTokenizer::getTokens1614,265,928cold
AFPTreeParser::doLevelEntry313787,168cold
AbuseFilterCachingParser::intEval *7103,765,040cold
AbuseFilterCachingParser::intEval17395,000warm

Conclusions

On itwiki, the old parser takes roughly 400ms and 3MB; the new parser takes 900ms and 3.3MB when cold, but only 117ms and 0.1MB when warm.
On enwiki, the old parser takes roughly 1300ms and 5MB; the new parser takes 710s and 3.7MB when cold, but only 173ms and 0.1MB when warm.
Of course we'd need way more data to take actual conclusions, but this looks very promising (as expected). The new parser seems to take slightly longer when cold, but that could just be a coincidence. We'll have time to investigate after deploying it in production.
While warm, it takes 3-6 times less time, and ~40 times less memory. That's huge. And if we resolve T230982, I believe we could be able to halve the warm runtime.


Links

AbuseFilterParser

itwiki
enwiki

AbuseFilterCachingParser

itwiki
enwiki
Daimona updated the task description. (Show Details)Aug 23 2019, 6:40 PM
Huji awarded a token.Sep 6 2019, 3:22 PM
Daimona updated the task description. (Show Details)Sep 8 2019, 6:31 PM
Daimona updated the task description. (Show Details)Sep 9 2019, 12:16 PM
Daimona updated the task description. (Show Details)Sep 15 2019, 6:11 PM

@Krinkle Production is now CachingParser-ready! Hence, we can start the deploy whenever you're available. I'm also on IRC right now, in case you want to schedule a day and time when I'm available to oversee the transition.

Daimona updated the task description. (Show Details)Sep 16 2019, 1:50 PM

Change 537151 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] [WIP] Add profiling points throughout the code for the CachingParser switch

https://gerrit.wikimedia.org/r/537151

Change 537702 had a related patch set uploaded (by Krinkle; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@wmf/1.34.0-wmf.23] Add profiling points throughout the code for the CachingParser switch

https://gerrit.wikimedia.org/r/537702

Change 537151 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Add profiling points throughout the code for the CachingParser switch

https://gerrit.wikimedia.org/r/537151

Change 537702 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@wmf/1.34.0-wmf.23] Add profiling points throughout the code for the CachingParser switch

https://gerrit.wikimedia.org/r/537702

Mentioned in SAL (#wikimedia-operations) [2019-09-18T22:37:14Z] <krinkle@deploy1001> Synchronized php-1.34.0-wmf.23/extensions/AbuseFilter/includes/: T156095, ff44043efa59e9 (duration: 01m 05s)

Change 538071 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Actually provide a StatsdDataFactory to the parser

https://gerrit.wikimedia.org/r/538071

Change 538072 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@wmf/1.34.0-wmf.23] Actually provide a StatsdDataFactory to the parser

https://gerrit.wikimedia.org/r/538072

Change 538071 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Actually provide a StatsdDataFactory to the parser

https://gerrit.wikimedia.org/r/538071

Change 538072 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@wmf/1.34.0-wmf.23] Actually provide a StatsdDataFactory to the parser

https://gerrit.wikimedia.org/r/538072

Mentioned in SAL (#wikimedia-operations) [2019-09-19T17:48:01Z] <krinkle@deploy1001> Synchronized php-1.34.0-wmf.23/extensions/AbuseFilter/includes/: T156095, 32cf50453cd (duration: 01m 04s)

@Krinkle I was able to find the data on graphite. So, is there anything actionable, or should we just keep collecting data for a while?

Nope, just collecting for a week so that we have a good baseline comparison. Our trends are different every hour and every week day, but week over week is quite stable in my experience.

I've updated the Grafana dashboard to include the new metrics: https://grafana.wikimedia.org/d/000000393/mediawiki-abusefilter-profiling.

Could you give the queries a look over as well? It's easy to make mistakes here :)

Nope, just collecting for a week so that we have a good baseline comparison. Our trends are different every hour and every week day, but week over week is quite stable in my experience.

Sure. I see we have data for ~10 days now, so we may already be able to get some results.

I've updated the Grafana dashboard to include the new metrics: https://grafana.wikimedia.org/d/000000393/mediawiki-abusefilter-profiling.
Could you give the queries a look over as well? It's easy to make mistakes here :)

Looks good visually, although I don't know how to check the query behind it. I don't think I'm admin on grafana. The only thing I can think of is, we'll also need some data for the new parser: abusefilter_cachingParser_full, which is comparable with abusefilter_oldparser_full, and abusefilter_cachingParser_buildtree, which maybe should go to its own graph, although I expect it to be pretty similar to the _full version.


As for the next step, do we want to enable it on group0 for now?

Change 539674 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[operations/mediawiki-config@master] Use AbuseFilterCachingParser for group0

https://gerrit.wikimedia.org/r/539674

Krinkle added a project: User-notice.EditedSep 29 2019, 8:14 PM

For Tech-News:

Over the next two weeks, a new (faster) parser for AbuseFilter will be gradually deployed. This is expected to shorten the waiting time when saving edits. More information can be found at T156095 on Phabricator.

Krinkle updated the task description. (Show Details)Sep 29 2019, 9:00 PM

Change 539674 merged by jenkins-bot:
[operations/mediawiki-config@master] Use AbuseFilterCachingParser for group0 and deploymentwiki

https://gerrit.wikimedia.org/r/539674

Daimona updated the task description. (Show Details)Sep 30 2019, 6:28 PM

FTR, the results are visible at https://grafana.wikimedia.org/d/000000393/mediawiki-abusefilter-profiling?orgId=1. For now there's not much data, so it's not very meaningful. With @Krinkle we agreed to enable CachingParser on group1 around Wed this week. Then wait at least a week before going to group2.

For now there's not much data, so it's not very meaningful. With @Krinkle we agreed to enable CachingParser on group1 around Wed this week.

We can also do a handful of wikis tomorrow on Tuesday. Was thinking nlwiki (my homewiki, where I'm AF manager), and perhaps itwiki if you think that would be okay?

For now there's not much data, so it's not very meaningful. With @Krinkle we agreed to enable CachingParser on group1 around Wed this week.

We can also do a handful of wikis tomorrow on Tuesday. Was thinking nlwiki (my homewiki, where I'm AF manager), and perhaps itwiki if you think that would be okay?

Sounds great. Would 15:00UTC work for you?

Change 540103 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[operations/mediawiki-config@master] Enable AbuseFilterCachingParser on itwiki and nlwiki

https://gerrit.wikimedia.org/r/540103

Change 540145 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Replace array_map with foreach

https://gerrit.wikimedia.org/r/540145

Change 540103 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable AbuseFilterCachingParser on itwiki and nlwiki

https://gerrit.wikimedia.org/r/540103

Mentioned in SAL (#wikimedia-operations) [2019-10-01T17:09:28Z] <krinkle@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T156095 - c28baa1862401 (duration: 00m 59s)

By looking at the graphs on grafana, It seems like the p95 times are roughly identical for the two parsers, but the new parser has a lower upper. IMHO this is fine, so we can see what happens with the rest of the wikis. Then we can think about improving the performance of the new parser and reducing the cache misses. As a side note, it sounds like the evalTree time is higher than I would've expected (and the buildtree time is lower). But again, this is something we can improve later (e.g. with r540145).

Hence, in light of the above, would it be fine if we just switch the rest of the world to the new parser at once? Testing so far didn't show any particular error, so IMHO we can safely move the remaining wikis, and revert if something bad happens. @Krinkle thoughts?

Change 540516 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] Enable AbuseFilterCachingParser for hewiki and commonswiki (group1)

https://gerrit.wikimedia.org/r/540516

Change 540516 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable AbuseFilterCachingParser for hewiki and commonswiki (group1)

https://gerrit.wikimedia.org/r/540516

Change 540627 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[operations/mediawiki-config@master] Use AbuseFilterCachingParser on wikidatawiki

https://gerrit.wikimedia.org/r/540627

Change 540627 merged by jenkins-bot:
[operations/mediawiki-config@master] Use AbuseFilterCachingParser on wikidatawiki

https://gerrit.wikimedia.org/r/540627

Change 540647 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] Enable AbuseFilterCachingParser for (almost) all wikis

https://gerrit.wikimedia.org/r/540647

Change 540647 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable AbuseFilterCachingParser for (almost) all wikis

https://gerrit.wikimedia.org/r/540647

Krinkle updated the task description. (Show Details)Oct 3 2019, 9:52 PM

This is p99 of saving items in Wikidata:


Nicely done Daimona

This is p99 of saving items in Wikidata:


Nicely done Daimona

I cannot say for sure if it's related, but if it is, that's great :D Let's see if we can bring it down further with T234427.

Change 541026 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[operations/mediawiki-config@master] Move the remaining wikis to AbuseFilterCachingParser

https://gerrit.wikimedia.org/r/541026

While enabling the new parser I've also fixed arg counting, hence mrwiki 21 is now erroring out.
I've asked local admins to fix it, but there's not much activity on the wiki...

Change 541028 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Align arg counting between the parsers

https://gerrit.wikimedia.org/r/541028

Paladox added a subscriber: Paladox.Oct 6 2019, 7:56 PM

Change 541026 merged by jenkins-bot:
[operations/mediawiki-config@master] Move the remaining wikis to AbuseFilterCachingParser

https://gerrit.wikimedia.org/r/541026

Mentioned in SAL (#wikimedia-operations) [2019-10-07T22:40:21Z] <krinkle@deploy1001> Synchronized wmf-config/InitialiseSettings.php: 7b9e6829821, T156095 (duration: 00m 51s)

Krinkle closed this task as Resolved.Oct 8 2019, 12:01 AM
Krinkle updated the task description. (Show Details)
Krinkle edited projects, added Performance-Team-publish; removed Patch-For-Review.
Krinkle moved this task from Untriaged to Archive on the Performance-Team-publish board.