Page MenuHomePhabricator

Increase EditAttemptStep sampling rate(s) to 100%
Closed, ResolvedPublic

Description

@MNeisler, @nettrom_WMF, @cchen, and @Ottomata confirmed that increasing the sampling rate of the EditAttemptStep instrument (EAS) to 100% shouldn't be a problem.

However, we can't just add

$wgDTSchemaEditAttemptStepSamplingRate = $wgWMESchemaEditAttemptStepSamplingRate = 1; // 100%

to LocalSettings.php because the VisualEditorFeatureUse instrument (VEFU) also uses those variables. In order to be able to vary the sampling rates for EAS and VEFU independently, we'll have to:

  1. Add new sample rate variables to VEFU
  2. Set EAS sample rate variables to 100%
  3. Remove $wgDTSchemaEditAttemptStepOversample (which is always true in production) and $wgDTSchemaEditAttemptSamplingRate (which is always 0 in production)
    • This is optional but would simplify the answer to the question "How is VEFU sampled?"

We should keep $wgWMESchemaEditAttemptStepSamplingRate (set to 1 in step 2 above) as it can be used as an on/off switch for EAS. It can be removed after EAS is migrated to Metrics Platform.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 820412 had a related patch set uploaded (by Phuedx; author: Phuedx):

[mediawiki/extensions/WikimediaEvents@master] Add WMESchemaVisualEditorFeatureUseSamplingRate config variable

https://gerrit.wikimedia.org/r/820412

Change 820413 had a related patch set uploaded (by Phuedx; author: Phuedx):

[mediawiki/extensions/DiscussionTools@master] logger: Use wgWMESchemaVisualEditorFeatureUseSamplingRate config variable

https://gerrit.wikimedia.org/r/820413

Change 820414 had a related patch set uploaded (by Phuedx; author: Phuedx):

[mediawiki/extensions/DiscussionTools@master] EventDispatcher: Remove reference to $wgWMESchemaEditAttemptStepOversample

https://gerrit.wikimedia.org/r/820414

Change 820415 had a related patch set uploaded (by Phuedx; author: Phuedx):

[mediawiki/extensions/VisualEditor@master] Use wgWMESchemaVisualEditorFeatureUseSamplingRate config variable

https://gerrit.wikimedia.org/r/820415

Change 820412 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] Add WMESchemaVisualEditorFeatureUseSamplingRate config variable

https://gerrit.wikimedia.org/r/820412

Reached out to Editing team in #humansoftheweb channel (is there a better place than this?) for their feedback before merging associated patches in DiscussionTools + VisualEditor.

Tested aforementioned patches and all lgtm so I will merge shortly hopefully with Editing team's blessing. Though I don't subscribe to silence === consent, in this case since the patches have been languishing in review, I will be bold if crickets ensue.

Change 820413 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] logger: Use wgWMESchemaVisualEditorFeatureUseSamplingRate config variable

https://gerrit.wikimedia.org/r/820413

Change 820415 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] Use wgWMESchemaVisualEditorFeatureUseSamplingRate config variable

https://gerrit.wikimedia.org/r/820415

Change 820414 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] EventDispatcher: Remove reference to $wgWMESchemaEditAttemptStepOversample

https://gerrit.wikimedia.org/r/820414

Ran the following query in Hue since the last DT patch was merged on 10/24:

SELECT * FROM mediawiki_edit_attempt WHERE custom_data IS NOT NULL AND year=2022 AND month=10 AND day > 24 AND day < 31;

Query returned 159 rows. Not sure if this is within reasonable ballpark of expected results. The EAS sampling rate in config $wgWMESchemaEditAttemptStepSamplingRate is set to 0.06 for default and 0.2 for a handful of pilot wikis.

Also, seems like _schema should not be null which is the case for all rows.

Here's a slice of the query results (happy to attach full CSV but unsure if i have to redact some data):

_schemaagentcustom_datadt
NULL{"app_install_id":null,"client_platform":"mediawiki_js","client_platform_family":"desktop_browser"}{"integration":{"data_type":"string","value":"discussiontools"},"loaded_timing":{"data_type":"number","value":"163"},"editing_session_id":{"data_type":"string","value":"671c83c5e7a71d17f0df"},"editor_interface":{"data_type":"string","value":"visualeditor"}}2022-10-25T02:02:04.592Z
NULL{"app_install_id":null,"client_platform":"mediawiki_js","client_platform_family":"desktop_browser"}{"integration":{"data_type":"string","value":"discussiontools"},"ready_timing":{"data_type":"number","value":"160"},"editing_session_id":{"data_type":"string","value":"0ace15ce513c7e780671"},"editor_interface":{"data_type":"string","value":"visualeditor"}}2022-10-25T02:03:59.389Z
NULL{"app_install_id":null,"client_platform":"mediawiki_js","client_platform_family":"desktop_browser"}{"integration":{"data_type":"string","value":"discussiontools"},"loaded_timing":{"data_type":"number","value":"95"},"editing_session_id":{"data_type":"string","value":"76e8013eb8c5d1524307"},"editor_interface":{"data_type":"string","value":"visualeditor"}}2022-10-25T02:07:19.803Z
NULL{"app_install_id":null,"client_platform":"mediawiki_js","client_platform_family":"desktop_browser"}{"integration":{"data_type":"string","value":"discussiontools"},"loaded_timing":{"data_type":"number","value":"85"},"editing_session_id":{"data_type":"string","value":"0c8b6f7f5bc9e3e5b254"},"editor_interface":{"data_type":"string","value":"visualeditor"}}2022-10-25T02:02:50.112Z
NULL{"app_install_id":null,"client_platform":"mediawiki_js","client_platform_family":"desktop_browser"}{"integration":{"data_type":"string","value":"discussiontools"},"loaded_timing":{"data_type":"number","value":"86"},"editing_session_id":{"data_type":"string","value":"10e7f04fb5e532e32d3c"},"editor_interface":{"data_type":"string","value":"visualeditor"}}2022-10-25T02:08:39.396Z
NULL{"app_install_id":null,"client_platform":"mediawiki_js","client_platform_family":"desktop_browser"}{"integration":{"data_type":"string","value":"discussiontools"},"init_mechanism":{"data_type":"string","value":"click"},"editing_session_id":{"data_type":"string","value":"0c8b6f7f5bc9e3e5b254"},"editor_interface":{"data_type":"string","value":"visualeditor"},"init_type":{"data_type":"string","value":"page"}}2022-10-25T02:02:50.028Z
NULL{"app_install_id":null,"client_platform":"mediawiki_js","client_platform_family":"desktop_browser"}{"integration":{"data_type":"string","value":"discussiontools"},"loaded_timing":{"data_type":"number","value":"1930"},"editing_session_id":{"data_type":"string","value":"0c59f30f2db47cf46809"},"editor_interface":{"data_type":"string","value":"wikitext-2017"}}2022-10-25T02:01:13.812Z
NULL{"app_install_id":null,"client_platform":"mediawiki_js","client_platform_family":"desktop_browser"}{"integration":{"data_type":"string","value":"discussiontools"},"ready_timing":{"data_type":"number","value":"86"},"editing_session_id":{"data_type":"string","value":"10e7f04fb5e532e32d3c"},"editor_interface":{"data_type":"string","value":"visualeditor"}}2022-10-25T02:08:39.396Z
NULL{"app_install_id":null,"client_platform":"mediawiki_js","client_platform_family":"desktop_browser"}{"integration":{"data_type":"string","value":"discussiontools"},"loaded_timing":{"data_type":"number","value":"93"},"editing_session_id":{"data_type":"string","value":"74b123d2290014ae69d5"},"editor_interface":{"data_type":"string","value":"visualeditor"}}2022-10-25T02:04:26.004Z
NULL{"app_install_id":null,"client_platform":"mediawiki_js","client_platform_family":"desktop_browser"}{"integration":{"data_type":"string","value":"discussiontools"},"init_mechanism":{"data_type":"string","value":"click"},"editing_session_id":{"data_type":"string","value":"76c323bb5d553793e6d4"},"editor_interface":{"data_type":"string","value":"visualeditor"},"init_type":{"data_type":"string","value":"page"}}2022-10-25T02:04:03.989Z
NULL{"app_install_id":null,"client_platform":"mediawiki_js","client_platform_family":"desktop_browser"}{"integration":{"data_type":"string","value":"discussiontools"},"loaded_timing":{"data_type":"number","value":"160"},"editing_session_id":{"data_type":"string","value":"0ace15ce513c7e780671"},"editor_interface":{"data_type":"string","value":"visualeditor"}}2022-10-25T02:03:59.390Z

Fwiw extending the query to year=2022 AND month=10 AND day > 20 AND day < 31 only returned 323 rows.

And the values for integration vary from discussiontools, page while for editor_interface, it's visualeditor, wikitext.

For what it's worth, with the deployment train, any changes made from the last few patches wouldn't have actually taken effect until October 27th.

I suppose that works out to ~5.2k edit attempts if we unsample it, or ~15k for the whole month. If that's across all projects and all edit methods, that seems much lower than expected.

Also, seems like _schema should not be null which is the case for all rows.

T255818: Refine drops $schema field values

ok - so turns out the edit_attempt stream sample rate is configured to 1 for testwiki only which explains why we have so few query results. We'll apply that sampling rate for all group0 wikis and expect to see an increase in rows returned starting tomorrow if I can get the config deployed this afternoon.

Change 851109 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[operations/mediawiki-config@master] Update sample rate for edit attempt stream to 1 for group 0.

https://gerrit.wikimedia.org/r/851109

Change 851109 merged by jenkins-bot:

[operations/mediawiki-config@master] Update sample rate for edit attempt stream to 1 for group 0.

https://gerrit.wikimedia.org/r/851109

Mentioned in SAL (#wikimedia-operations) [2022-10-31T20:12:44Z] <cjming@deploy1002> Started scap: Backport for [[gerrit:851109|Update sample rate for edit attempt stream to 1 for group 0. (T312016)]]

Mentioned in SAL (#wikimedia-operations) [2022-10-31T20:13:03Z] <cjming@deploy1002> cjming and cjming: Backport for [[gerrit:851109|Update sample rate for edit attempt stream to 1 for group 0. (T312016)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2022-10-31T20:17:01Z] <cjming@deploy1002> Finished scap: Backport for [[gerrit:851109|Update sample rate for edit attempt stream to 1 for group 0. (T312016)]] (duration: 04m 17s)

Will verify data tomorrow since edit_attempt_step sample rate was bumped up to 1 for group0 wikis this afternoon.

We're also monitoring the event rate for the VEFU stream (eventlogging_VisualEditorFeatureUse). With that in mind, I don't see any discernable change in the event rate since last Thursday: https://grafana.wikimedia.org/d/000000018/eventlogging-schema?orgId=1&var-schema=VisualEditorFeatureUse&from=1666742400000&to=1667260799000

🎉

Change 851652 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[operations/mediawiki-config@master] Update Edit Attempt Step sampling rate to 1 for group 0 wikis

https://gerrit.wikimedia.org/r/851652

☝️ @cjming and I discussed this out of band but it's worth mentioning it here too for posterity: $wgWMESchemaEditAttemptStepSamplingRate will also need to be set to 1 on group0 wikis as well.

Change 851652 merged by jenkins-bot:

[operations/mediawiki-config@master] Update Edit Attempt Step sampling rate to 1 for group 0 wikis

https://gerrit.wikimedia.org/r/851652

Mentioned in SAL (#wikimedia-operations) [2022-11-01T20:29:44Z] <cjming@deploy1002> Started scap: Backport for [[gerrit:851652|Update Edit Attempt Step sampling rate to 1 for group 0 wikis (T312016)]]

Mentioned in SAL (#wikimedia-operations) [2022-11-01T20:29:53Z] <cjming@deploy1002> cjming and cjming: Backport for [[gerrit:851652|Update Edit Attempt Step sampling rate to 1 for group 0 wikis (T312016)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2022-11-01T20:33:24Z] <cjming@deploy1002> Finished scap: Backport for [[gerrit:851652|Update Edit Attempt Step sampling rate to 1 for group 0 wikis (T312016)]] (duration: 04m 29s)

Change 854005 had a related patch set uploaded (by Phuedx; author: Phuedx):

[operations/mediawiki-config@master] wgWMESchemaEditAttemptStepSamplingRate to 1 on group1 wikis

https://gerrit.wikimedia.org/r/854005

Change 854006 had a related patch set uploaded (by Phuedx; author: Phuedx):

[operations/mediawiki-config@master] wgWMESchemaEditAttemptStepSamplingRate to 1 everywhere

https://gerrit.wikimedia.org/r/854006

Change 854005 merged by jenkins-bot:

[operations/mediawiki-config@master] EditAttemptStep sampling rate to 1 for group1 wikis

https://gerrit.wikimedia.org/r/854005

Mentioned in SAL (#wikimedia-operations) [2022-11-08T08:30:13Z] <kartik@deploy1002> Started scap: Backport for [[gerrit:854005|EditAttemptStep sampling rate to 1 for group1 wikis (T312016)]]

Mentioned in SAL (#wikimedia-operations) [2022-11-08T08:30:33Z] <kartik@deploy1002> kartik and phuedx: Backport for [[gerrit:854005|EditAttemptStep sampling rate to 1 for group1 wikis (T312016)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet

Change 854475 had a related patch set uploaded (by Phuedx; author: Phuedx):

[operations/mediawiki-config@master] EditAttemptStep sampling rate to 1 for group1 wikis

https://gerrit.wikimedia.org/r/854475

Change 854570 had a related patch set uploaded (by Phuedx; author: Phuedx):

[operations/mediawiki-config@master] EditAttemptStep sampling rate to 1 everywhere

https://gerrit.wikimedia.org/r/854570

Change 854475 merged by jenkins-bot:

[operations/mediawiki-config@master] EditAttemptStep sampling rate to 1 for group1 wikis

https://gerrit.wikimedia.org/r/854475

Mentioned in SAL (#wikimedia-operations) [2022-11-09T08:12:56Z] <kartik@deploy1002> Started scap: Backport for [[gerrit:854475|EditAttemptStep sampling rate to 1 for group1 wikis (T312016)]]

Mentioned in SAL (#wikimedia-operations) [2022-11-09T08:13:18Z] <kartik@deploy1002> kartik and phuedx: Backport for [[gerrit:854475|EditAttemptStep sampling rate to 1 for group1 wikis (T312016)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2022-11-09T08:21:07Z] <kartik@deploy1002> Finished scap: Backport for [[gerrit:854475|EditAttemptStep sampling rate to 1 for group1 wikis (T312016)]] (duration: 08m 10s)

Change 854006 abandoned by Phuedx:

[operations/mediawiki-config@master] wgWMESchemaEditAttemptStepSamplingRate to 1 everywhere

Reason:

I373b741156939c4e2c820ba0bd538c40a1ec45fb

https://gerrit.wikimedia.org/r/854006

Change 854570 merged by jenkins-bot:

[operations/mediawiki-config@master] EditAttemptStep sampling rate to 1 everywhere

https://gerrit.wikimedia.org/r/854570

Mentioned in SAL (#wikimedia-operations) [2022-11-15T21:03:54Z] <cjming@deploy1002> Started scap: Backport for [[gerrit:854570|EditAttemptStep sampling rate to 1 everywhere (T312016)]]

Mentioned in SAL (#wikimedia-operations) [2022-11-15T21:04:18Z] <cjming@deploy1002> cjming and phuedx: Backport for [[gerrit:854570|EditAttemptStep sampling rate to 1 everywhere (T312016)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2022-11-15T21:08:40Z] <cjming@deploy1002> Finished scap: Backport for [[gerrit:854570|EditAttemptStep sampling rate to 1 everywhere (T312016)]] (duration: 04m 45s)

There's been a significant in the number of events flowing on the mediawiki.edit_attempt and eventlogging_EditAttemptStep streams (see https://grafana.wikimedia.org/d/ZB39Izmnz/eventgate?orgId=1&viewPanel=74&from=1668542400000&to=1668643199000) with no corresponding increase in the number of validation errors (see https://logstash.wikimedia.org/goto/66b8b50eba0db999c6843b5594635a1e) 🎉🎉🎉

Note well that the flow rate on the mediawiki.edit_attempt stream will match that of the eventlogging_EditAttemptStep after T309985: Migrate WikiEditor EditAttemptStep instrument to Metrics Platform is resolved and https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiEditor/+/855990/ has been merged.

Removing inactive assignee (please do so as part of team offboarding!).

phuedx claimed this task.

Being bold.

So is sampling rate 100% everywhere, then? Can someone please confirm and update event.editattemptstep documentation on DataHub? Because that page still says it's 6.25% of all editing sessions by default and 20% on some wikis.

hi @mpopov -- I believe this is true based on prod config:
https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/refs/heads/master/wmf-config/ext-EventStreamConfig.php#1036

I'm happy to update that page on DataHub - will follow up shortly with edits -- turns out I don't have edit rights

So is sampling rate 100% everywhere, then? Can someone please confirm and update event.editattemptstep documentation on DataHub?

Done™. I've kept the information about the complex sampling rates and oversampling mechanisms and also added a link to where I also documented it on Phab.

Thank you, @phuedx! (And welcome back!)

@cjming Aw! :( I've asked in #data-catalog for DE to hook you up with proper permissions.

I've asked in #data-catalog for DE to hook you up with proper permissions.

thanks @mpopov - i can edit now \o/