Page MenuHomePhabricator

Add a link: create new Schema
Open, HighPublic

Description

We'll need a new schema to log interactions with the add link plugin and various dialogs. We could make something specific to this project, like StructuredTaskLinkRecommendation or something more generic that could be reused for other structured tasks. We should clarify the approach before starting on this work.

Event Timeline

kostajh triaged this task as High priority.Mar 22 2021, 7:57 PM
kostajh created this task.
kostajh moved this task from Backlog to Post-release backlog on the Add-Link board.

This blocks most of the other instrumentation tasks.

MMiller_WMF renamed this task from Create new Schema to Add a link: create new Schema.Mar 23 2021, 4:27 AM

Change 681052 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[schemas/event/secondary@master] [WIP] Create structuredtask/article/edit schema

https://gerrit.wikimedia.org/r/681052

I have some feedback on my patch, I will have another pass at updating the schema today/tomorrow.

@kostajh : I've reviewed the proposed schema and compared it to the measurement specification and the relevant phab tasks for those sections. As I expected there was not a lot to bring up, excellent work as usual! Here are my questions:

  1. Onboarding, user clicks the "Learn more" link. I'd expect that to be captured as a action="link_click" but what is action_data set to? And in what interfaces is that available? There's some documentation of how action_data is set but not for the onboarding steps.
  2. Which action corresponds to the user "Opening up the blue link to view the full article of the link target"? Is that action=view_help, or something else?
  3. I'd like to confirm what's logged when the user is getting to the Edit summary screen. From what I understand there's two paths to it: 1) advancing/skipping from the last link suggestion, which I think means action=(next|accept|reject) on the last link, and 2) clicking "Publish/Submit", which I think means action=submit. Did I get that correct?

Apart from that, I'm wondering if we can store page_title as well as page_id? We of course want the latter because they don't change when pages move, but we might want to store the former to make it easier to look up pages. I'm happy to find a page title using the MariaDB command line, but it can also be quicker to copy & paste page titles from event data in the Data Lake. @MMiller_WMF , what do you think?

@nettrom_WMF -- I have nothing to add about your questions, but I actually do think it could be okay to skip page_title if @kostajh would rather skip it. Just because I think it's been pretty rare that we actually have looked up specific pages in our event logging data. But if you like to do it, then by all means let's record it.

@nettrom_WMF -- I have nothing to add about your questions, but I actually do think it could be okay to skip page_title if @kostajh would rather skip it. Just because I think it's been pretty rare that we actually have looked up specific pages in our event logging data. But if you like to do it, then by all means let's record it.

It should be trivial to include page_title, so I'll include it.

@nettrom_WMF thanks for your comments, I'll go through the draft schema and make some updates, then I'll get back to you on your questions.

@kostajh : I've reviewed the proposed schema and compared it to the measurement specification and the relevant phab tasks for those sections. As I expected there was not a lot to bring up, excellent work as usual! Here are my questions:

  1. Onboarding, user clicks the "Learn more" link. I'd expect that to be captured as a action="link_click" but what is action_data set to? And in what interfaces is that available? There's some documentation of how action_data is set but not for the onboarding steps.

I've proposed it like this:

For action=link_click in the onboarding_step_1:
         - dont_show_again is true or false

So, action=link_click and active_interface is onboarding_step_1, and the additional action_data is whether the "Don't show again" box is checked or unchecked. Does that sound OK?

  1. Which action corresponds to the user "Opening up the blue link to view the full article of the link target"? Is that action=view_help, or something else?

action=view_help is the ? icon in the mobile link inspector, I'll clarify that.

  1. I'd like to confirm what's logged when the user is getting to the Edit summary screen. From what I understand there's two paths to it: 1) advancing/skipping from the last link suggestion, which I think means action=(next|accept|reject) on the last link, and 2) clicking "Publish/Submit", which I think means action=submit. Did I get that correct?

There's also proceeding via the skipall_dialog interface. I'll rework this some more and update the notes.

Apart from that, I'm wondering if we can store page_title as well as page_id? We of course want the latter because they don't change when pages move, but we might want to store the former to make it easier to look up pages. I'm happy to find a page title using the MariaDB command line, but it can also be quicker to copy & paste page titles from event data in the Data Lake. @MMiller_WMF , what do you think?

Have added that in the latest version of the patch.


@nettrom_WMF high level questions for you in the draft patch:

  1. does the overall combination of action + active_interface look correct to you? Are there items you think are missing from the enum for active_interface or are there others you'd prefer to see removed (e.g. a single item for onboarding instead of how it is split into three currently)?
  2. could you weigh in on @Tgr's comment about possibly using a separate schema for recommendation items to handle interactions with individual link suggestions?
  1. Onboarding, user clicks the "Learn more" link. I'd expect that to be captured as a action="link_click" but what is action_data set to? And in what interfaces is that available? There's some documentation of how action_data is set but not for the onboarding steps.

So, action=link_click and active_interface is onboarding_step_1, and the additional action_data is whether the "Don't show again" box is checked or unchecked. Does that sound OK?

I checked with the current version on the beta cluster, and it looks like the "Learn more" link is no longer in the interface, so we can ignore all of this. @RHo or @MMiller_WMF : can one you confirm whether that is the case?

  1. Which action corresponds to the user "Opening up the blue link to view the full article of the link target"? Is that action=view_help, or something else?

action=view_help is the ? icon in the mobile link inspector, I'll clarify that.

Thanks for clarifying that! Just so I'm sure I've got this right, clicking on the link to view the full article of the link target of a recommended link is logged with mode = recommendedlinktoolbar_dialog and action = link_click then? I think I just missed that because I didn't grok it the first time around.

There's also proceeding via the skipall_dialog interface. I'll rework this some more and update the notes.

Ah, got it! I was able to check that on the beta cluster and have added a comment to the measurement specification about it so I can update that later.

@nettrom_WMF high level questions for you in the draft patch:

  1. does the overall combination of action + active_interface look correct to you? Are there items you think are missing from the enum for active_interface or are there others you'd prefer to see removed (e.g. a single item for onboarding instead of how it is split into three currently)?

Yeah, I'm really liking that setup! From what I can tell it looks correct, and I don't see anything missing. I also prefer the specificity, I think having three items for onboarding will make it a lot easier to work with.

  1. could you weigh in on @Tgr's comment about possibly using a separate schema for recommendation items to handle interactions with individual link suggestions?

I'll do that, but I'll need a little more time to think about it so I'll get to that tomorrow. It's perhaps best if I continue that discussion in gerrit?

  1. Onboarding, user clicks the "Learn more" link. I'd expect that to be captured as a action="link_click" but what is action_data set to? And in what interfaces is that available? There's some documentation of how action_data is set but not for the onboarding steps.

So, action=link_click and active_interface is onboarding_step_1, and the additional action_data is whether the "Don't show again" box is checked or unchecked. Does that sound OK?

I checked with the current version on the beta cluster, and it looks like the "Learn more" link is no longer in the interface, so we can ignore all of this. @RHo or @MMiller_WMF : can one you confirm whether that is the case?

The "Learn more" link appears if it is set in MediaWiki:NewcomerTasks.json. Some wikis will have this, others may not. For example, cswiki has a "learnmore" link defined for the references task type but not for the new link-recommendation one https://cs.wikipedia.org/wiki/MediaWiki:NewcomerTasks.json

  1. Which action corresponds to the user "Opening up the blue link to view the full article of the link target"? Is that action=view_help, or something else?

action=view_help is the ? icon in the mobile link inspector, I'll clarify that.

Thanks for clarifying that! Just so I'm sure I've got this right, clicking on the link to view the full article of the link target of a recommended link is logged with mode = recommendedlinktoolbar_dialog and action = link_click then? I think I just missed that because I didn't grok it the first time around.

Yeah, I think so. It would be active_interface=recommendedlinktoolbar_dialog action=link_click

There's also proceeding via the skipall_dialog interface. I'll rework this some more and update the notes.

Ah, got it! I was able to check that on the beta cluster and have added a comment to the measurement specification about it so I can update that later.

@nettrom_WMF high level questions for you in the draft patch:

  1. does the overall combination of action + active_interface look correct to you? Are there items you think are missing from the enum for active_interface or are there others you'd prefer to see removed (e.g. a single item for onboarding instead of how it is split into three currently)?

Yeah, I'm really liking that setup! From what I can tell it looks correct, and I don't see anything missing. I also prefer the specificity, I think having three items for onboarding will make it a lot easier to work with.

  1. could you weigh in on @Tgr's comment about possibly using a separate schema for recommendation items to handle interactions with individual link suggestions?

I'll do that, but I'll need a little more time to think about it so I'll get to that tomorrow. It's perhaps best if I continue that discussion in gerrit?

Sure, that works. FWIW I'm leaning towards not making a separate schema for that, but if you feel like that is the better approach then we can of course do it.

We were trying to think about how this schema might be reusable by future structured tasks (like image recommendations or references) but I think that will be easier to do once we actually have one, so I'm inclined to not try to generalize anything at this point.

The "Learn more" link appears if it is set in MediaWiki:NewcomerTasks.json. Some wikis will have this, others may not. For example, cswiki has a "learnmore" link defined for the references task type but not for the new link-recommendation one https://cs.wikipedia.org/wiki/MediaWiki:NewcomerTasks.json

Aha! That all makes sense to me then, that we log clicks on it as active_interface=onboarding_step_1 action=link_click because it's the only link that's there.

Sure, that works. FWIW I'm leaning towards not making a separate schema for that, but if you feel like that is the better approach then we can of course do it.

We were trying to think about how this schema might be reusable by future structured tasks (like image recommendations or references) but I think that will be easier to do once we actually have one, so I'm inclined to not try to generalize anything at this point.

We're on the same page here. I think at some future point we'd like to create a separate schema, but it's easier to make when we have concrete tasks to design it with. I added a comment on the patch saying the same thing.

Change 684127 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] AddLink: LinkSuggestionInteraction logger and instrumentation

https://gerrit.wikimedia.org/r/684127

Change 681052 merged by jenkins-bot:

[schemas/event/secondary@master] Create structured_task/article/link_suggestion_interaction schema

https://gerrit.wikimedia.org/r/681052

Change 690070 had a related patch set uploaded (by Gergő Tisza; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.5] AddLink: LinkSuggestionInteraction logger and instrumentation

https://gerrit.wikimedia.org/r/690070

Change 684127 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] AddLink: LinkSuggestionInteraction logger and instrumentation

https://gerrit.wikimedia.org/r/684127

Change 690070 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.5] AddLink: LinkSuggestionInteraction logger and instrumentation

https://gerrit.wikimedia.org/r/690070

Mentioned in SAL (#wikimedia-operations) [2021-05-13T12:40:40Z] <tgr@deploy1002> Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments: Backport: instrumentation patches ([[gerrit:690070|]] [[gerrit:690071|]] [[gerrit:690072|]] [[gerrit:690073|]]) (T278116 T278117 T278114 T278177 T278487 T278112 T278111 T278118) (duration: 01m 09s)

Change 690459 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] Enable structured_task/article/link_suggestion_interaction schema

https://gerrit.wikimedia.org/r/690459

Change 690459 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable structured_task/article/link_suggestion_interaction schema

https://gerrit.wikimedia.org/r/690459

Change 690661 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] Fix link_suggestion_interaction stream name

https://gerrit.wikimedia.org/r/690661

Change 690661 merged by jenkins-bot:

[operations/mediawiki-config@master] Fix link_suggestion_interaction stream name

https://gerrit.wikimedia.org/r/690661

Mentioned in SAL (#wikimedia-operations) [2021-05-13T19:05:31Z] <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: 80e5b9d: cd113a7: Enable structured_task/article/link_suggestion_interaction schema (T278177) (duration: 01m 06s)

AH! We need an example.

I had thought jsonschema-tools would require this!

making patch.

Change 691232 had a related patch set uploaded (by Ottomata; author: Ottomata):

[schemas/event/secondary@master] Add example to link_suggestion_interaction 1.0.0

https://gerrit.wikimedia.org/r/691232

Change 691232 merged by jenkins-bot:

[schemas/event/secondary@master] Add example to link_suggestion_interaction 1.0.0

https://gerrit.wikimedia.org/r/691232

Moving to Test in Production.

There is one small note: acceptance_state=undecided is recorded for action: "suggestion_accept" Basically, it's a no-issue (all subsequently logged regarding 'Yes' answers are correct) unless acceptance_state=undecided would be counted outside of the context.

  • go to the last suggestions and click on it to bring the context item

Correct: acceptance_state=undecided;manual_focus=true

action: "suggestion_focus", action_data: "link_target=Prekmurje;link_text=Prekmurje;probabil…son=;acceptance_state=undecided;manual_focus=true", is_mobile: false, active_interface: "recommendedlinktoolbar_dialog", newcomer_task_token: "2718fa42257d6cab0ff8", …}
$schema: "/analytics/mediawiki/structured_task/article/link_suggestion_interaction/1.0.0"
action: "suggestion_focus"
action_data: "link_target=Prekmurje;link_text=Prekmurje;probability_score=0.5709846019744873;series_number=5;rejection_reason=;acceptance_state=undecided;manual_focus=true"
active_interface: "recommendedlinktoolbar_dialog"
dt: "2021-05-24T20:08:41.119Z"
homepage_pageview_token: "9vhlarb7uv4b3nk05st7hi0kqatsioti"
is_mobile: false
  • Click 'Yes'

Incorrect? action: "suggestion_accept", acceptance_state=undecided"

action: "suggestion_accept", action_data: "link_target=Prekmurje;link_text=Prekmurje;probabil…er=5;rejection_reason=;acceptance_state=undecided", is_mobile: false, active_interface: "recommendedlinktoolbar_dialog", newcomer_task_token: "2718fa42257d6cab0ff8", …}
$schema: "/analytics/mediawiki/structured_task/article/link_suggestion_interaction/1.0.0"
action: "suggestion_accept"
action_data: "link_target=Prekmurje;link_text=Prekmurje;probability_score=0.5709846019744873;series_number=5;rejection_reason=;acceptance_state=undecided"
active_interface: "recommendedlinktoolbar_dialog"
dt: "2021-05-24T20:09:04.332Z"
homepage_pageview_token: "9vhlarb7uv4b3nk05st7hi0kqatsioti"
is_mobile: false
  • click the 'Next' button (it's the last Next, so it brings the Summary dialog)

Correct acceptance_state=accepted"

action: "next", action_data: "link_target=Prekmurje;link_text=Prekmurje;probabil…ber=5;rejection_reason=;acceptance_state=accepted", is_mobile: false, active_interface: "recommendedlinktoolbar_dialog", newcomer_task_token: "2718fa42257d6cab0ff8", …}
$schema: "/analytics/mediawiki/structured_task/article/link_suggestion_interaction/1.0.0"
action: "next"
action_data: "link_target=Prekmurje;link_text=Prekmurje;probability_score=0.5709846019744873;series_number=5;rejection_reason=;acceptance_state=accepted"
active_interface: "recommendedlinktoolbar_dialog"
dt: "2021-05-24T20:12:45.289Z"
homepage_pageview_token: "9vhlarb7uv4b3nk05st7hi0kqatsioti"
is_mobile: false


 action: "impression", action_data: "accepted_count=1;rejected_count=0;skipped_count=5", is_mobile: false, active_interface: "editsummary_dialog", newcomer_task_token: "2718fa42257d6cab0ff8", …}
$schema: "/analytics/mediawiki/structured_task/article/link_suggestion_interaction/1.0.0"
action: "impression"
action_data: "accepted_count=1;rejected_count=0;skipped_count=5"
active_interface: "editsummary_dialog"
dt: "2021-05-24T20:12:45.482Z"
homepage_pageview_token: "9vhlarb7uv4b3nk05st7hi0kqatsioti"
is_mobile: false