Page MenuHomePhabricator

Implement topic subscription instrumentation
Open, HighPublic

Description

This task represents the work with implementing the new instrumentation that was decided upon in T277349.

Instrumentation Requirements

  1. Implement all changes described in talk_page_event spec.
    • Note: this new instrumentation will be used to track published edits on all talk page namespaces including new topics, comments, and responses made using either page editing methods or the Reply Tool and New Discussion Tool.
  1. Re-enable all notification-related events currently tracked in the Echo schema.
    • Note: All notification-related events will be tracked using existing instrumentation in EchoInteraction and Echo.

Open questions

  • 1. How – if at all – might the instrumentation implementation be impacted by us deciding to limit the initial implementation of Automatic Topic Subscriptions to DiscussionTools interfaces?

Instrumentation QA instructions

Done

  • The new instrumentation that is needed is defined and documented in this ticket
  • The new instrumentation is implemented
  • A document/reference is linked that contains all of the actions Editing QA ought to test and the corresponding events they should expect to be emitted in the browser's console
  • QA has verified the new instrumentation/events are being emitted from clients in expected ways
  • Tickets are filed for any unexpected behavior
  • New instrumentation is documented on-wiki
    • Two notes: 1) TBD where exactly this page will exist and what information it will contain and 2) Within this page, it's likely we'll need to document events that are shared across interfaces (e.g. Log save_success_timing from Schema:EditAttemptStep can be used in conjunction within Schema:VisualEditorFeatureUse.

Event Timeline

ppelberg created this task.
ppelberg edited projects, added Editing-team (Tracking); removed Editing-team.
ppelberg moved this task from Backlog to Analytics on the Editing-team (Tracking) board.
MNeisler edited projects, added Product-Analytics (Kanban); removed Product-Analytics.

@LZaman

Attached is the documentation of the proposed talk_page_event schema [i], which can be forwarded to legal for their review. The attached documentation includes a description of the new schema, the purpose for collecting these events, and a description of all the fields that will be tracked.

In your request to legal, you can also let them know that I am available to walk through the new schema in a meeting if they prefer (They've sometimes requested this in the past to help understand the type of data being collected). Let me know if there's anything else you need for the request.

(cc @ppelberg )


i. Note: This reflects the same fields documented in the instrumentation spec but presented in a clearer format for legal review and without the details on existing instrumentation and metrics that will be used in the project.

I'm assigning this over to @DLynch to verify the task description does not contain anything that is unexpected.

Note: implementation is blocked on legal review (T288695).

Change 731333 had a related patch set uploaded (by DLynch; author: DLynch):

[schemas/event/secondary@master] talk_page_event schema

https://gerrit.wikimedia.org/r/731333

Change 731334 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/DiscussionTools@master] Logging for new comments

https://gerrit.wikimedia.org/r/731334

Change 731854 had a related patch set uploaded (by DLynch; author: DLynch):

[operations/mediawiki-config@master] Add event stream config for discussiontools

https://gerrit.wikimedia.org/r/731854

Change 731860 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/VisualEditor@master] Pass editingSessionId through to API save requests

https://gerrit.wikimedia.org/r/731860

Change 731333 merged by jenkins-bot:

[schemas/event/secondary@master] talk_page_edit schema

https://gerrit.wikimedia.org/r/731333

Change 731860 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] Pass editingSessionId through to API save requests

https://gerrit.wikimedia.org/r/731860

Change 731334 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] Logging for new comments

https://gerrit.wikimedia.org/r/731334

Change 731854 merged by jenkins-bot:

[operations/mediawiki-config@master] Add event stream config for discussiontools

https://gerrit.wikimedia.org/r/731854

Mentioned in SAL (#wikimedia-operations) [2021-11-02T00:02:17Z] <legoktm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Add event stream config for discussiontools (T286076) (duration: 00m 55s)

QA has verified the new instrumentation/events are being emitted from clients in expected ways

Per what @DLynch shared in today's – 1 November – team standup, the above is not necessary because the new instrumentation added as part of this task only impacts server-side code. cc @Ryasmeen

The config patch that merged this evening was necessary for the deployment -- it just hooked up the event stream so the logging knew what to do once events start appearing when the train goes out.

I've updated the talk_page_edit schema spec document that Megan provided with the actual field names we wound up using.

Hi @MNeisler ! This is ready for you to QA by Nov 4.

Hi Megan, reassigning this to you!

I've completed a quick check and confirmed that the new mediawiki_talk_page_edit schema is logging events as of 2021 November 03. To date, we've logged about 97,543 events across all wikis.

@DLynch I noticed on logstash that we are getting some validation errors around the performer.user_edit_count_bucket field. See dashboard and details below:

'.performer.user_edit_count_bucket' should be string, '.performer.user_edit_count_bucket' should be equal to one of the allowed values

This appears to occur for events by anon users where the performer.user_edit_count_bucket is assigned to null.

'.performer.user_edit_count_bucket' should be equal to one of the allowed values

This occurs for logged-in users that fall into the 1-4 edit bucket count. Taking a look at the schema, it looks like there might be an extra space in the `1-4 edits ' enum that is causing this.

Keeping this assigned to me right now as I'm still double-checking that the other fields are populating as expected, but at quick glance, everything else looks good. I will follow-up on this task once I've completed the rest of the QA.

@DLynch I've finished QA of the new mediawiki_talk_page_edit schema. See full summary of any potential issues below:

  • We are not logging any events for anon users identified by either performer.user_is_anonymous = TRUE or performer.user_id = 0. It looks like this might be related to a validation error I found in logstash. See dashboard and details below:

'.performer.user_edit_count_bucket' should be string, '.performer.user_edit_count_bucket' should be equal to one of the allowed values

This occurs for events by anon users where the performer.user_edit_count_bucket is assigned to null.

  • We are not logging any events for users in the 1-4 edit count group.

Related logstash error:

'.performer.user_edit_count_bucket' should be equal to one of the allowed values

This occurs only for logged-in users that fall into the 1-4 edit bucket count. Taking a look at the schema, it looks like there might be an extra space in the `1-4 edits ' enum.

  • We are logging both component_type = comment and component_type = response events as expected. Is there any instance we would expect to see component_type = topic? If not, I'd recommend removing this from the enum list.
  • Not all integration = page sessions recorded in the mediawiki_talk_page_edit are also recorded in the editattemptstep schema. I believe this is because integration = page events are sampled in EditAttemptStep. Would it be possible to make sure all mediawiki_talk_page_edit events are also sampled in EditAttemptStep?

Note: This is not an issue for integration = discussiontools because these are sampled at 100%.

@DLynch - Assigning over you to review and implement any needed changes but let me know if you have any questions about the above.

Taking a look at the schema, it looks like there might be an extra space in the `1-4 edits ' enum.

Amusingly enough, this is because I copied that entire field-definition from the existing content_translation_event schema... which thus also suffers from this problem. 😁

This occurs for events by anon users where the performer.user_edit_count_bucket is assigned to null.

I think this would also occur for events in content_translation_event. Would you like me to make it log them as "0 edits" even though that's arguably not-accurate?

Is there any instance we would expect to see component_type = topic? If not, I'd recommend removing this from the enum list.

Nope, nothing would currently cause that. I only put it in the enum because it was in the spec and I figured you might have some future reason for it to be there.

Would it be possible to make sure all mediawiki_talk_page_edit events are also sampled in EditAttemptStep?

I think what I have to do here is turn down how many events are being sampled for integration = page -- I can see that the current override for integration = discussiontools is applying to it, so I should make that sampling only apply when it really should be. I don't think I can 100% guarantee there'll always be 1:1 presence for EditAttemptStep and talk_page_edit sessions, just because of data that's not available when the talk_page_edit logging is happening... but it can be closer.

Taking a look at the schema, it looks like there might be an extra space in the `1-4 edits ' enum.

Amusingly enough, this is because I copied that entire field-definition from the existing content_translation_event schema... which thus also suffers from this problem. 😁

The good news is that, through manual materialization, we can actually fix the typos in both schemas without bumping the version. Structurally nothing changes so if we delete all the files except current.yaml, update that (without changing the version specified), and run ./node_modules/.bin/jsonschema_tools materialize jsonschema/analytics/mediawiki/talk_page_edit/current.yaml it'll be fine.

This occurs for events by anon users where the performer.user_edit_count_bucket is assigned to null.

I think this would also occur for events in content_translation_event. Would you like me to make it log them as "0 edits" even though that's arguably not-accurate?

I would recommend adding an explicit 'N/A' to the enum to use with IP editors. This too can be a same-version modification (via the process described above). Quick note: if we want to make the addition through a new version then it would have to be appended to the end of the enum.

Change 739865 had a related patch set uploaded (by DLynch; author: DLynch):

[schemas/event/secondary@master] Update talk_page_edit schema

https://gerrit.wikimedia.org/r/739865

Change 739884 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/DiscussionTools@master] Fixes for talk_page_edit logging

https://gerrit.wikimedia.org/r/739884

Change 739865 merged by jenkins-bot:

[schemas/event/secondary@master] Update talk_page_edit schema

https://gerrit.wikimedia.org/r/739865

Thanks @DLynch! The changes look good to me.

I'll do a final QA check once the new changes are deployed to make sure that anon and 1-4 edit count bucket events are now logging correctly.

Change 739884 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] Fixes for talk_page_edit logging

https://gerrit.wikimedia.org/r/739884