Page MenuHomePhabricator

Help panel: Develop Schema
Closed, ResolvedPublic

Description

We developed questions that we would like to answer during the Help Panel intervention, which will need a schema. The questions can be found in parent task T206719.

Measurement specifications document

Event Timeline

JTannerWMF triaged this task as Medium priority.Nov 20 2018, 5:17 PM
JTannerWMF created this task.
JTannerWMF updated the task description. (Show Details)
JTannerWMF moved this task from Inbox to Upcoming Work on the Growth-Team board.

Per meeting today, unassigning myself in case @Catrope or @SBisson want to take the lead.

MMiller_WMF renamed this task from Help pane: Develop Schema to Help panel: Develop Schema .Nov 29 2018, 7:15 PM
MMiller_WMF updated the task description. (Show Details)

I think we want to track events when the user:

  • Sees the panel (panel is displayed)
  • Opens the panel
  • Closes the panel
  • Advances to the question review panel
  • Goes back to the home panel
  • Clicks a link in the panel
  • Types something into the text box (i.e. text box changed from empty to non-empty)
  • Submits a question
  • Submission succeeds
  • Submission fails

I can think of the following data that we'd want to track for each event:

  • User ID
  • Which editor (visual, wikitext, wikitext 2017)
  • Desktop or mobile web site
  • Name of the page being edited
  • Protection status of the page / whether the user is allowed to edit the page
  • For link click events: which link was clicked
    • Do we track the link text, link URL, or both?
    • Alternatively, we could add IDs/symbolic names to the link config, and track that ID here instead
  • For submission success events: some way of identifying/tracking the question
    • revid of the edit (to the help desk page) adding the question?
    • section name? (may not be unique)
    • Do we want to inject some sort of ID / tracking thing when posting questions?
    • In a future in which we support Flow-based help desks, we should use the topic ID for those

hi @Catrope - just wanted to check if the following would be captured by, or included in addition to the above list:

  • Namespace in which the help panel is triggered (if we decide to add help panel to Main, Talk, User pages, etc)
  • User adds an email on the review question step
  • Count of when the question is the *first edit* made by a user?
  • User clicks on the link to view their question on "Help desk" (from the confirmation step)
  • Can we track if a user edits their question on the Help desk after it is posted?
  • To what extent can we track when HD responses to help panel questions are posted and whether users go to HD to view the responses?

Additional items pending further discussion:

  • When help panel is turned on or off by the user in Preferences (pending outcome of discussion on T206716)
  • If T209301 is possible for V1, is it possible to see search terms entered and search results clicked?

Count of when the question is the *first edit* made by a user?
Can we track if a user edits their question on the Help desk after it is posted?

I think @nettrom_WMF can pull this out of the Mediawiki database pretty easily if we do T211118: Help panel: provide tag for help panel related edits.

"Name of the page being edited"

In EditorJourney we were careful to obfuscate this data in NS_MAIN and other sensitive namespaces. Even though the user is viewing the edit screen they are still in "reading" mode unless they actually make an edit, at which point anyone can see that they've edited a page. Do we want to obfuscate or redact identifying information of the pages that they are viewing?

@RHo for the settings cog, I'd like to propose a popup with the link and a sentence explaining to the user to uncheck the "Enable editor help panel" checkbox. That way we inform the user about how to enable/disable without directing them to a settings page where they have to figure it out on their own. Within that same popup, that might be good place for "More about this feature" / "Feedback". If we do this, then we'll want to incorporate settings-click and preferences-click, more-about-this-feature-click into the schema.

  • Namespace in which the help panel is triggered (if we decide to add help panel to Main, Talk, User pages, etc)

Yes

  • User adds an email on the review question step

No, but good suggestion, I'll try to work that in.

  • Count of when the question is the *first edit* made by a user?

As Kosta says, this can be derived from other data, but adding a field for the user's edit count is easy, so I'll just do that.

  • User clicks on the link to view their question on "Help desk" (from the confirmation step)

We have a generic "user clicks on link" event that would specify the ID / symbolic name of the link that was clicked as an additional piece of data. We haven't defined what these IDs are yet, but I'm hoping that we can assign one to every link, and that the "view question" link will also have an ID.

  • Can we track if a user edits their question on the Help desk after it is posted?

Morten will be able to see if the user edits the help desk page after posting their question. We won't know exactly what they're editing without manual inspection, but we can make an educated guess that if the user e.g. edits the help desk again within a certain amount of time and/or without intervening edits, they are probably editing their question. Note that it's also possible that they got an answer and are responding to that answer; but the help desk is a big page and we can't easily tell if an intervening edit answered the user's question or is part of the discussion about a completely different question.

  • To what extent can we track when HD responses to help panel questions are posted

Can't be done without either manual inspection or some sort of bulk after-the-fact analysis of the help desk page contents themselves (both of which are out of scope for schema design)

and whether users go to HD to view the responses?

Can't be done unless we either luck out and they do it within the first 24h of their account lifetime (in which case EditorJourney will catch it), or we modify the criteria for EditorJourney tracking to include page views to the help desk for a prolonged period of time (>24h) for users who have used the help panel. Even then, what we would know is whether the user viewed the help desk page; we wouldn't know whether they viewed their question specifically, and it'd be some work for us to know whether their question has an answer (again requiring separate analysis of the content of the help desk page).

Additional items pending further discussion:

  • When help panel is turned on or off by the user in Preferences (pending outcome of discussion on T206716)

There is an existing schema called PrefUpdate that tracks all preference changes, I believe Morten should be able to cross-reference that.

  • If T209301 is possible for V1, is it possible to see search terms entered and search results clicked?

Possible yes, but storing people's search terms is a bit of a can of worms privacy-wise. I'll leave it to Marshall, Kosta and the Legal people whether that's something we want to track. For EditorJourney, we decided not to store search terms that people enter in the main search bar. But arguably this is a bit different because it's much more narrowly focused, and even if we don't store the search terms in a way that links them to the user that typed them, it does sound like a good idea to separately store an aggregate anonymized list of search terms that people enter into this form.

Count of when the question is the *first edit* made by a user?
Can we track if a user edits their question on the Help desk after it is posted?

I think @nettrom_WMF can pull this out of the Mediawiki database pretty easily if we do T211118: Help panel: provide tag for help panel related edits.

Yes. Adding the edit count as I mentioned above is also pretty easy though.

"Name of the page being edited"

In EditorJourney we were careful to obfuscate this data in NS_MAIN and other sensitive namespaces. Even though the user is viewing the edit screen they are still in "reading" mode unless they actually make an edit, at which point anyone can see that they've edited a page. Do we want to obfuscate or redact identifying information of the pages that they are viewing?

I don't think this is necessary, because once someone sees (let alone interacts with) the help panel, they have already initiated an edit session, and the EditAttemptStep events logged for that will contain the unredacted page title already. If we decide to display the help panel in other contexts later, that would change things.

That also reminds me: would it be a good idea to enable oversampling of EditAttemptStep events when the help panel is enabled? On the one hand, it seems useful to be able to cross-reference to that schema; on the other hand, I don't think that anything that's been requested requires that, and the increase in the number of events would be more substantial than that done by EditorJourney.

That also reminds me: would it be a good idea to enable oversampling of EditAttemptStep events when the help panel is enabled? On the one hand, it seems useful to be able to cross-reference to that schema; on the other hand, I don't think that anything that's been requested requires that, and the increase in the number of events would be more substantial than that done by EditorJourney.

I don't think we need oversampling of EditAttemptStep in addition to what is already done in relation to EditorJourney. While working on putting the specifications together, I haven't seen a question that requires data from EditAttemptStep.

I like the schema so far and many of the suggestions that have come up (e.g. adding the edit count). A few thoughts:

  • Having some kind of ID or symbolic name for each link would be really useful as we have questions related to clicking specific links, both links to help information as well as the link to view their question on the help desk.
  • Being able to track that the user clicked on the cog would be useful. I'm not sure what the current plan for that is? As Roan mentions, if the user goes through and changes their preference we will be able to identify that through the PrefUpdate schema.
  • Can we store the length of the question (amount of content entered in the question field) as well?
  • Do we need the session token? I would like to have the page token, because that'll make it easy to group events together for a specific edit, but I'm unsure if we need the session token for anything.

That's what I have. Let me know if I missed something in the previous comments that I should respond to, or if something's unclear.

I don't think we need oversampling of EditAttemptStep in addition to what is already done in relation to EditorJourney. While working on putting the specifications together, I haven't seen a question that requires data from EditAttemptStep.

OK, good to know.

  • Being able to track that the user clicked on the cog would be useful. I'm not sure what the current plan for that is? As Roan mentions, if the user goes through and changes their preference we will be able to identify that through the PrefUpdate schema.

We can either make that a separate type of action. I think that would be preferable over abusing the link-click action.

  • Can we store the length of the question (amount of content entered in the question field) as well?

Sure. I think the most logical place to put this would be the action data for the`submit-attempt` action.

  • Do we need the session token? I would like to have the page token, because that'll make it easy to group events together for a specific edit, but I'm unsure if we need the session token for anything.

Happy to remove it if you don't think we need it. I included both because that's what EditAttemptStep does, but while edit sessions can extend across multiple page views, help panel sessions can't.

  • Being able to track that the user clicked on the cog would be useful. I'm not sure what the current plan for that is? As Roan mentions, if the user goes through and changes their preference we will be able to identify that through the PrefUpdate schema.

We can either make that a separate type of action. I think that would be preferable over abusing the link-click action.

On reflection, it functions exactly like a link (opens a URL in a new tab), so I think we should just track it as a link-click action with a specific ID.

That’s going to change somewhat: T211400

Do we want to maybe add search as an action? It wouldn't need to track what the user searched, only that they searched something and have now moved to the search results panel. Then when they go back to the home panel, we could log the back-home action.

@SBisson -- I just went through the whole "Help Panel -- measurement specifications" document, and also went through this whole Phab task. As you pick up this task, I think it would probably be good to take a close read of the document (including the comments) and compare it to what @Catrope has drafted for the schema, because I think that the document is a more recent set of requirements that came together after @Catrope assembled the draft schema. If you see comments or discrepancies that should be updated in the document, please go ahead and make the updates to the text of the document and resolve the associated comment.

I've updated Roan's schema based on the documents and discussion. It is now on meta at https://meta.wikimedia.org/wiki/Schema:HelpPanel

Per T206711#4835724, I've marked the schema "active".

We're only missing the purge strategy. Should it be the same thing as EditorJourney?

@SBisson -- @nettrom_WMF will help us figure out the purge strategy in January via T212464. I think for now, you can make it the same as EditorJourney.

Checked in betalabs according to https://meta.wikimedia.org/wiki/Schema:HelpPanel specifications. The data gets recorded from both - desktop and mobile; I checked all 11 actions. submit-failure was not checked.

+---------------------+
| event_action        |
+---------------------+
| impression          |
| open                |
| enter-question-text |
| cog-close           |
| review              |
| close               |
| submit-attempt      |
| submit-success      |
| link-click          |
| back-home           |
| cog-open            |
+---------------------+

event_action='link-click' provides the following data (with all other info on page title where the question was initiated etc)
+---------------------+

event_action_data

+---------------------+

special-preferences
view-question
view-more
example

+---------------------+

@kostajh During my meeting with @MMiller_WMF today we discussed storing the HelpPanel data, and when I was walking through the schema I noticed that we do not have a session identifier that we can store (we have page_token and session_token in the schema, but those cannot be stored indefinitely). I would like to have an identifier that allows us to combine all interaction with the help panel that occurs during the same editing session, and that we can store for the duration of our Help Panel experiment. Looking at EditAttemptStep, I noticed that has an editing_session_id field. Could we add something like that to the HelpPanel schema? Might be possible to reuse it for all I know, I'll leave the implementation details to you.

@kostajh During my meeting with @MMiller_WMF today we discussed storing the HelpPanel data, and when I was walking through the schema I noticed that we do not have a session identifier that we can store (we have page_token and session_token in the schema, but those cannot be stored indefinitely). I would like to have an identifier that allows us to combine all interaction with the help panel that occurs during the same editing session, and that we can store for the duration of our Help Panel experiment. Looking at EditAttemptStep, I noticed that has an editing_session_id field. Could we add something like that to the HelpPanel schema? Might be possible to reuse it for all I know, I'll leave the implementation details to you.

This doesn't make much sense to me. How can it be that it's not OK to store page_token (whose lifetime is shorter than editing_session_id) and not OK to store session_token (whose lifetime is longer than editing_session_id), but it is OK to store editing_session_id (whose lifetime is in between)?

Lifetimes:

  • page_token: Unique to each page load; lives until the user navigates away from the page
  • session_token: Persists across page loads; lives until the user closes their browser (or stays away from the site for many days, maybe?)
  • editing_session_id: Unique to each edit attempt; can persist across page loads in limited circumstances. AFAIK this only persists when submitting an edit form or switching between visual and wikitext mode (and I believe some types of switches don't persist it). The (not fully realized, AFAIK) intent of this token is to identify a single editor session/"view" (from opening the editor to saving/abandoning), even if due to implementation details that extends across multiple page views (in VE it's all one page view, but in the 2010 wikitext editor every save attempt and every preview is a new page view).

From @nettrom_WMF

I'd like to propose that we add editing_session_id to the schema, and reuse the same ID that the EditAttempStep schema uses (since the help panel is used during editing, and the definition of an editing session that Roan describes in the Phab task makes sense to me). I don't see a need for page_token since I think it's unlikely that we can store it past 90 days (because it can be cross-referenced with ReadingDepth to get page namespace and title).

Change 482747 had a related patch set uploaded (by Catrope; owner: Catrope):
[mediawiki/extensions/GrowthExperiments@master] Help panel: Use our own session ID instead of page_token

https://gerrit.wikimedia.org/r/482747

Change 482747 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Help panel: Use our own session ID instead of page_token

https://gerrit.wikimedia.org/r/482747

Change 482824 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[mediawiki/extensions/GrowthExperiments@master] Stop logging page_token

https://gerrit.wikimedia.org/r/482824

Change 482824 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Stop logging page_token

https://gerrit.wikimedia.org/r/482824

The HelpPanel table has not been updated in betalabs eventlogging for quite a long time:

MariaDB [log]>  SELECT TABLE_NAME,UPDATE_TIME FROM information_schema.tables WHERE  TABLE_SCHEMA = 'log' AND TABLE_NAME in ('EditAttemptStep_18530416', 'EditorJourney_18504997', 'HelpPanel_18721886');
+--------------------------+---------------------+
| TABLE_NAME               | UPDATE_TIME         |
+--------------------------+---------------------+
| EditAttemptStep_18530416 | 2019-01-18 00:54:17 |
| EditorJourney_18504997   | 2019-01-18 01:00:17 |
| HelpPanel_18721886       | 2019-01-04 22:33:14 |
+--------------------------+---------------------+
3 rows in set (0.00 sec)

And event_page_token is present in the table.

The HelpPanel table has not been updated in betalabs eventlogging for quite a long time:

MariaDB [log]>  SELECT TABLE_NAME,UPDATE_TIME FROM information_schema.tables WHERE  TABLE_SCHEMA = 'log' AND TABLE_NAME in ('EditAttemptStep_18530416', 'EditorJourney_18504997', 'HelpPanel_18721886');
+--------------------------+---------------------+
| TABLE_NAME               | UPDATE_TIME         |
+--------------------------+---------------------+
| EditAttemptStep_18530416 | 2019-01-18 00:54:17 |
| EditorJourney_18504997   | 2019-01-18 01:00:17 |
| HelpPanel_18721886       | 2019-01-04 22:33:14 |
+--------------------------+---------------------+
3 rows in set (0.00 sec)

And event_page_token is present in the table.

We updated the version of the HelpPanel schema. Look for a table named HelpPanel_18766440

Confirmed that the schema data in the Data Lake does not contain the page_token field. As far as I'm concerned, the schema development is now complete, so I'm reassigning to @MMiller_WMF so he can review/close as needed.