Page MenuHomePhabricator

Re-enable AbuseFilterCachingParser once we are sure it's safe
Closed, ResolvedPublic

Description

Tracking task for known incompatibilities between the new and old rule parser, before we can re-enabling the new parser.

Previous issues of site performance, citing AbuseFilter as contributing factor or root cause:

As the parser will be working, unit tests for it should be enabled (see https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/453996/)


Plan for finishing AbuseFilterCachingParser
  1. Fix "Short-circuit evaluation with multiple comparisons". – T218906, https://gerrit.wikimedia.org/r/498127
  2. Fix "Short circuit with assignments . –" T214674 (only if the fix is easy; to be discussed...)
  3. Fix and re-enable unit tests for CachingParser. – https://gerrit.wikimedia.org/r/454557
  4. Deprecate "wild" abusefilter syntax. – T156096
  5. Investigate and fix AbuseFilterCachingParser bug with "if" conditions. – T152281
  6. Improve parameters handling - https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/531140/
  7. Allow ifs without else and reject empty conditions (because CachingParser doesn't support empty if). – T230727
  8. Speed up hoisting for the CachingParser – T230982
  9. Report errors inside short-circuited blocks when checking syntax. – T232498
  10. Change the CachingParser to throw an AFPUserVisibleException instead of a TypeError for empty operands. – T156096, T153251
  11. Fix any on-wiki filters that use "wild" abusefilter syntax, including trailing commas in function calls (T153251) – T156096
  12. Right before enabling the new parser, re-check logstash and ensure no empty operands are logged – T156096, T153251
  13. Enable CachingParser in production according to the plan below
  14. Use CachingParser as default in extension.json - Not before completing the switch in production, ensuring that we have no regressions, and completing the deprecations above; tentatively proposing MW 1.35.
Plan for deploying AbuseFilterCachingParser

(See T156095#5434572 for a preview)

  1. Establish base line metric for amount of time spent in AbuseFilter during saving of edits. For example, the time spent in AbuseFilter within Backend Save Timing (mean avg, median, p75; for three different days in a given week), as well as the total amount of time (median, p75, for three different days).
  2. Enable on Beta Cluster and group0 wikis and monitor impact on the above metrics as well as the general Save Timing and Backend Save Timing dashboards in Grafana.
  3. If no performance degradation (T156095#5425912), enable on group1.
  4. If no performance degradation, enable on group2.
Nice-to-haves
  1. Fix AbuseFilterCachingParser bug causing stack overflow. – T148660 (not a blocker, per T148660#5411970)
Other perf work on AbuseFilter
  1. Reduce the frequency at which we store send data to the cache.
  2. T53294: Overhaul internal profiling system
  3. T219092: Runtime profiling shouldn't include time taken to parse wikitext

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Krinkle Production is now CachingParser-ready! Hence, we can start the deploy whenever you're available. I'm also on IRC right now, in case you want to schedule a day and time when I'm available to oversee the transition.

Change 537151 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] [WIP] Add profiling points throughout the code for the CachingParser switch

https://gerrit.wikimedia.org/r/537151

Change 537702 had a related patch set uploaded (by Krinkle; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@wmf/1.34.0-wmf.23] Add profiling points throughout the code for the CachingParser switch

https://gerrit.wikimedia.org/r/537702

Change 537151 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Add profiling points throughout the code for the CachingParser switch

https://gerrit.wikimedia.org/r/537151

Change 537702 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@wmf/1.34.0-wmf.23] Add profiling points throughout the code for the CachingParser switch

https://gerrit.wikimedia.org/r/537702

Mentioned in SAL (#wikimedia-operations) [2019-09-18T22:37:14Z] <krinkle@deploy1001> Synchronized php-1.34.0-wmf.23/extensions/AbuseFilter/includes/: T156095, rEABFff44043efa59 (duration: 01m 05s)

Change 538071 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Actually provide a StatsdDataFactory to the parser

https://gerrit.wikimedia.org/r/538071

Change 538072 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@wmf/1.34.0-wmf.23] Actually provide a StatsdDataFactory to the parser

https://gerrit.wikimedia.org/r/538072

Change 538071 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Actually provide a StatsdDataFactory to the parser

https://gerrit.wikimedia.org/r/538071

Change 538072 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@wmf/1.34.0-wmf.23] Actually provide a StatsdDataFactory to the parser

https://gerrit.wikimedia.org/r/538072

Mentioned in SAL (#wikimedia-operations) [2019-09-19T17:48:01Z] <krinkle@deploy1001> Synchronized php-1.34.0-wmf.23/extensions/AbuseFilter/includes/: T156095, rEABF32cf50453cd1 (duration: 01m 04s)

@Krinkle I was able to find the data on graphite. So, is there anything actionable, or should we just keep collecting data for a while?

Nope, just collecting for a week so that we have a good baseline comparison. Our trends are different every hour and every week day, but week over week is quite stable in my experience.

I've updated the Grafana dashboard to include the new metrics: https://grafana.wikimedia.org/d/000000393/mediawiki-abusefilter-profiling.

Could you give the queries a look over as well? It's easy to make mistakes here :)

Nope, just collecting for a week so that we have a good baseline comparison. Our trends are different every hour and every week day, but week over week is quite stable in my experience.

Sure. I see we have data for ~10 days now, so we may already be able to get some results.

I've updated the Grafana dashboard to include the new metrics: https://grafana.wikimedia.org/d/000000393/mediawiki-abusefilter-profiling.

Could you give the queries a look over as well? It's easy to make mistakes here :)

Looks good visually, although I don't know how to check the query behind it. I don't think I'm admin on grafana. The only thing I can think of is, we'll also need some data for the new parser: abusefilter_cachingParser_full, which is comparable with abusefilter_oldparser_full, and abusefilter_cachingParser_buildtree, which maybe should go to its own graph, although I expect it to be pretty similar to the _full version.


As for the next step, do we want to enable it on group0 for now?

Change 539674 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[operations/mediawiki-config@master] Use AbuseFilterCachingParser for group0

https://gerrit.wikimedia.org/r/539674

For Tech-News:

Over the next two weeks, a new (faster) parser for AbuseFilter will be gradually deployed. This is expected to shorten the waiting time when saving edits. More information can be found at T156095 on Phabricator.

Change 539674 merged by jenkins-bot:
[operations/mediawiki-config@master] Use AbuseFilterCachingParser for group0 and deploymentwiki

https://gerrit.wikimedia.org/r/539674

FTR, the results are visible at https://grafana.wikimedia.org/d/000000393/mediawiki-abusefilter-profiling?orgId=1. For now there's not much data, so it's not very meaningful. With @Krinkle we agreed to enable CachingParser on group1 around Wed this week. Then wait at least a week before going to group2.

For now there's not much data, so it's not very meaningful. With @Krinkle we agreed to enable CachingParser on group1 around Wed this week.

We can also do a handful of wikis tomorrow on Tuesday. Was thinking nlwiki (my homewiki, where I'm AF manager), and perhaps itwiki if you think that would be okay?

For now there's not much data, so it's not very meaningful. With @Krinkle we agreed to enable CachingParser on group1 around Wed this week.

We can also do a handful of wikis tomorrow on Tuesday. Was thinking nlwiki (my homewiki, where I'm AF manager), and perhaps itwiki if you think that would be okay?

Sounds great. Would 15:00UTC work for you?

Change 540103 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[operations/mediawiki-config@master] Enable AbuseFilterCachingParser on itwiki and nlwiki

https://gerrit.wikimedia.org/r/540103

Change 540145 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Replace array_map with foreach

https://gerrit.wikimedia.org/r/540145

Change 540103 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable AbuseFilterCachingParser on itwiki and nlwiki

https://gerrit.wikimedia.org/r/540103

Mentioned in SAL (#wikimedia-operations) [2019-10-01T17:09:28Z] <krinkle@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T156095 - c28baa1862401 (duration: 00m 59s)

By looking at the graphs on grafana, It seems like the p95 times are roughly identical for the two parsers, but the new parser has a lower upper. IMHO this is fine, so we can see what happens with the rest of the wikis. Then we can think about improving the performance of the new parser and reducing the cache misses. As a side note, it sounds like the evalTree time is higher than I would've expected (and the buildtree time is lower). But again, this is something we can improve later (e.g. with r540145).

Hence, in light of the above, would it be fine if we just switch the rest of the world to the new parser at once? Testing so far didn't show any particular error, so IMHO we can safely move the remaining wikis, and revert if something bad happens. @Krinkle thoughts?

Change 540516 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] Enable AbuseFilterCachingParser for hewiki and commonswiki (group1)

https://gerrit.wikimedia.org/r/540516

Change 540516 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable AbuseFilterCachingParser for hewiki and commonswiki (group1)

https://gerrit.wikimedia.org/r/540516

Change 540627 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[operations/mediawiki-config@master] Use AbuseFilterCachingParser on wikidatawiki

https://gerrit.wikimedia.org/r/540627

Change 540627 merged by jenkins-bot:
[operations/mediawiki-config@master] Use AbuseFilterCachingParser on wikidatawiki

https://gerrit.wikimedia.org/r/540627

Change 540647 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] Enable AbuseFilterCachingParser for (almost) all wikis

https://gerrit.wikimedia.org/r/540647

Change 540647 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable AbuseFilterCachingParser for (almost) all wikis

https://gerrit.wikimedia.org/r/540647

This is p99 of saving items in Wikidata:

image.png (789×1 px, 81 KB)

Nicely done Daimona

This is p99 of saving items in Wikidata:

image.png (789×1 px, 81 KB)

Nicely done Daimona

I cannot say for sure if it's related, but if it is, that's great :D Let's see if we can bring it down further with T234427.

Change 541026 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[operations/mediawiki-config@master] Move the remaining wikis to AbuseFilterCachingParser

https://gerrit.wikimedia.org/r/541026

While enabling the new parser I've also fixed arg counting, hence mrwiki 21 is now erroring out.
I've asked local admins to fix it, but there's not much activity on the wiki...

Change 541028 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Align arg counting between the parsers

https://gerrit.wikimedia.org/r/541028

Change 541026 merged by jenkins-bot:
[operations/mediawiki-config@master] Move the remaining wikis to AbuseFilterCachingParser

https://gerrit.wikimedia.org/r/541026

Mentioned in SAL (#wikimedia-operations) [2019-10-07T22:40:21Z] <krinkle@deploy1001> Synchronized wmf-config/InitialiseSettings.php: 7b9e6829821, T156095 (duration: 00m 51s)

Krinkle updated the task description. (Show Details)
Krinkle edited projects, added Wikimedia-Performance-publish; removed Patch-For-Review.

Change 541028 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Align arg counting between the parsers

https://gerrit.wikimedia.org/r/541028