Page MenuHomePhabricator

Drop "non bot" condition from ORES changeprop rules
Closed, ResolvedPublic

Description

Stop filtering out bot edits in ORES config. We've decided that we do want to score enwiki and wikidata bot edits, which are by far the largest volume of bot editing, so simplifying the config is probably worth the cost.

Potential impacts

  • Up to a 3-fold increase in changeprop traffic (citation needed). Changeprop messages only contain metadata, and are in the low-kB size range. At 700 req/min, 2kB per message, we're adding at most 1,400 messages/min or 2.8MB/min to changeprop.
  • We're talking about computing scores for most of this additional volume, so a corresponding increase in load on the scoring workers.
  • Any additional precached scores will be available to external requests, and cache expiration would move up in time, so we might see changes in the cache hit ratio.

Event Timeline

awight triaged this task as Medium priority.Feb 21 2018, 7:15 PM
awight created this task.

@awight are you computing scores for them? If not, it would be good to keep the condition in ChangeProp in order to reduce useless network traffic and noise.

In IRC, we determined this is the right thing to do. We do want to compute scores on bot edits in enwiki (wp10 model) and wikidata (itemquality model). For the other wikis, bot traffic is low-ish (~10%?), but we're not going to compute scores for bot edits. I'll see if I can write changeprop rules to filter accordingly.

Our team still needs to discuss the details.

What do we need to discuss here? Is bandwidth an issue?

Change 424145 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/services/change-propagation/deploy@master] Don't filter bots from the ORES stream

https://gerrit.wikimedia.org/r/424145

Change 424146 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/services/ores/deploy@master] Include bot edits in precaching wikidata itemquality

https://gerrit.wikimedia.org/r/424146

We discussed this in sync meeting and confirmed that we want to move forward with it -- given very careful monitoring.

Should deploy changeprop part first (and monitor network traffic) and the ORES part second (and monitor scoring load).

Everything discussed so far still holds, roughly, but I wanted to mention that I re-read the deployed ORES config and we'll actually be scoring a much higher load as soon as the changeprop patch is deployed, because most of our wikis aren't filtered on the ORES side. This deployment is tentatively scheduled for tomorrow morning.

Change 424145 merged by Ppchelko:
[mediawiki/services/change-propagation/deploy@master] Don't filter bots from the ORES stream

https://gerrit.wikimedia.org/r/424145

Mentioned in SAL (#wikimedia-operations) [2018-04-30T20:05:20Z] <ppchelko@tin> Started deploy [changeprop/deploy@8cd45ed]: Don't filter bots from the ORES stream T187927

Mentioned in SAL (#wikimedia-operations) [2018-04-30T20:06:36Z] <ppchelko@tin> Finished deploy [changeprop/deploy@8cd45ed]: Don't filter bots from the ORES stream T187927 (duration: 01m 15s)

Change 424146 merged by Awight:
[mediawiki/services/ores/deploy@master] Include bot edits in precaching wikidata itemquality

https://gerrit.wikimedia.org/r/424146

Mentioned in SAL (#wikimedia-operations) [2018-04-30T20:22:17Z] <awight@tin> Started deploy [ores/deploy@bf182e2]: ORES: Include bot edits in precaching wikidata itemquality; T187927

Looks like these two changes increased our precache scoring volume something around 50%, but it's hard to say.

awight mentioned this in Unknown Object (Phame Post).May 2 2018, 6:41 PM
awight mentioned this in Unknown Object (Phame Post).
awight mentioned this in Unknown Object (Phame Post).