Page MenuHomePhabricator

Finalize on schema for the new instrumentation
Closed, ResolvedPublic

Description

The schema that we want to implement is https://meta.wikimedia.org/wiki/Schema:Popups


As @leila notes on the Trello board-

The current questions proposed by the team to answer is: Can the decrease in pageviews in the presence of Hovercards be compensated by the number of Hovercards being used by users (readers/editors).
As part of this research Analytics will help UX to finalize 1 or more research questions to ask and to test for/verify via the data collected during the three month trial period. Analytics will also work with the team to make sure the schema is correct, and is capturing the right set of data.


This is the instrumentation that we are planning to implement.

Once we've made it default on a particular language wiki there can be two user groups:
A. People who'll disable it using the settings (within Hovercards, not MediaWiki preference)
B. People who'll continue to use Hovercards

The instrumentation for both would be the same barring a few
differenxes that I'll point out in the end. The schema will roughly be
as follows:

  1. Unique session id
  2. Duration (in ms) for how long the hovercard was open
  3. Whether the hovercard or the link that was being hovered on was:
    1. Dismissed: The mouse left the area of the link/hovercard.
    2. Opened in New Tab: Command/Ctrl + Click or Scroll Click
    3. Opened in New Window: Shift + Click
    4. Opened in Same Window: Click
  4. Is hovercard enabled?
  5. The popup delay (default/value set by user/value set by wiki's common.js)

For user group A:

  1. Values (2) and (5) will not be logged.
  2. (3) will be logged for the links themselves as there will be no hovercard to show.
  3. Consequently, (3.1) will never be logged for A

We need to confirm that this will give us the data we need to validate the case for Hovercards

Event Timeline

Prtksxna assigned this task to leila.
Prtksxna raised the priority of this task from to High.
Prtksxna updated the task description. (Show Details)
Prtksxna added a project: Page-Previews.
Prtksxna added subscribers: Aklapper, Prtksxna.

@Prtksxna, two questions:

  1. is this a logged-in or logged-out feature?
  2. the default will be Hovercard on. If the user disables Hovercard and the user is logged-out, will his/her choice be reset after n days?

@Prtksxna, two questions:

  1. is this a logged-in or logged-out feature?

Both, it'll be the default behavior everywhere.

  1. the default will be Hovercard on. If the user disables Hovercard and the user is logged-out, will his/her choice be reset after n days?

No, the choice won't be reset.
(@Jaredzimmerman-WMF, correct my if I am wrong)

  1. the default will be Hovercard on. If the user disables Hovercard and the user is logged-out, will his/her choice be reset after n days?

No, the choice won't be reset.
(@Jaredzimmerman-WMF, correct my if I am wrong)

Are you doing this using a cookie? If so, most of the cookies I've seen reset after 30 days. How long will your cookie persist?

If the only question you want to answer is: whether or not the introduction of Hovercards changes the pageview numbers significantly, the answer to this question is not important. If you want to do user analysis to see how they used Hovercards, however, the answer to this question is important.

Are you doing this using a cookie?

We use $.jStorage to save this. If its in localStorage which it most probably should be then we won't have the expiry problem.

@Prtksxna, you should have a definitive answer for whether your data will suffer from the expiry problem or not. You can test this yourself. Try Hovercards from your browser while logged-out, disable Hovercards, and reset your browser cookies. If after doing that, you see Hovercards again, this is undesirable behavior.

WARNING: $.jStorage is using cookies. Popups has the expiry problem!

@leila, thanks for pointing out this problem. I think ResourceLoader takes up all localStorage.

thanks for pointing out this problem. I think ResourceLoader takes up all localStorage.

Yes, even if now ResourceLoader cleans up old cached js files after himself. My and many other local storages of users I know are still full from before this was fixed. Don't really know if it was indeed fixed at some point :-)

Thanks for checking @Prtksxna.

@Jaredzimmerman-WMF, I understand that the community has asked for a three-month trial period for everyone. However, given the limitations for logged-out users' control over Hovercards discussed here, I recommend that you reconsider how you push this feature. A better approach is to push it out gradually. For example, implement userToken and push Hovercards to 10% of userTokens in each language. Look at the results after 1-2 weeks, if you feel comfortable about the feature, push it to more users.

@leila do we have a way to do that? incremental per wiki rollouts both for logged in and logged out users? I know something similar was done for HHVM. If we went with a plan like that I worry that the messaging would be rather complex.

Why don't we set a longer cookie? 90 days? 360 days? How does an incremental rollout solve the cookie issue? Will be be able to differentiate easily users who have it disabled (but available) and users who it didn't roll out to?

@leila do we have a way to do that? incremental per wiki rollouts both for logged in and logged out users? I know something similar was done for HHVM. If we went with a plan like that I worry that the messaging would be rather complex.

We can at least do it in two steps. First show it to n% of the population. If the results are satisfactory, we push it to the rest of the population.

What do you mean by messaging?

Why don't we set a longer cookie? 90 days? 360 days? How does an incremental rollout solve the cookie issue? Will be be able to differentiate easily users who have it disabled (but available) and users who it didn't roll out to?

To answer this, we need to know what you want to do with Hovercards when it's fully in production. If as a logged-out user you let me set preferences (for example, disable Hovercards), how do you make sure that this feature is always disabled for me? For example, my browser is set to reset cookies each time I close the browser, and I close it everyday. This means that everyday, I'll see Hovercards when reading Wikipedia, and if I don't want to see it, I have to opt-out of it everyday. How do you want to handle this situation at scale? One thing you can do is to show Hovercards only to logged-in users and let them opt-out if they don't want to see it. This way, we will always know their settings. However, this means the majority of users won't see it unless they log in, which may be fine: you have to log in to use special features like Hovercards.

@leila if, after the trial we allow logged out users to set a preference for hovercards we'll handle it in the same or similar way to mediaviewer. Due to how poor the experience is for setting preferences in cookies I'd prefer to not offer the option at all.

If we're moving toward longer cookies (e.g. T68699: Increase "remember me" login cookie expiry from 30 days to 1 year on Wikimedia wikis) I don't know why we couldn't have this preference have a similar expiry if we allow settings for IP users at all.

@Jaredzimmerman-WMF, let's make it simple then. If you can make the cookies not expire during the trial, please do it. It makes the data more reliable.

As for the schema itself, @Prtksxna: which of the actions register the action that a hovercard gets opened in the page but it's not clicked on by the user for a new page to open in the same tab, in another tab, etc. This is important data to have if later you want to count one hovercard view equivalent to one pageview.
Second question, what does the schema log if Hovercard is disabled?

@Jaredzimmerman-WMF, @Prtksxna

@Nuria and I discussed the schema this morning. Here's the outcome of our discussion with next step items.

Question
Based on the discussion with Jared, we want to make sure to have a schema that can answer one question at the end of the trial: Does Hovercards views change pageviews? If yes, how the impact can be described in a high-level?

To answer this question, we propose the schema be updated as recommended below.

Analysis
We will sample Wikipedia pages in each of the three languages and will log all Hovercard interactions in those pages in the trial period. We will then compare the number of pageviews in the presence of Hovercards feature with the number of pageviews in the absence of Hovercards, for example, in the month of February, considering the sample pages we log data for in each language. Focusing on specific pages will enable us to do more accurate analysis on the pageview comparisons.

Schema

  • Add page_id_source: this item will capture the page_id of the page the Hovercard is triggered on. For example, if user is in page X and hovers over link Y, the page_id will be X.
  • Add page_id_hover: this item will capture the page_id corresponding to the link on which Hovercard was triggered on, Y in the above example.
  • Remove sessionId: we don't need this data for comparing pageview counts.

@Jaredzimmerman-WMF, before proceeding further, please let us know if you have any concerns or not. :-)

@Prtksxna, please go ahead and make the changes to the schema once Jared approves it. Please set aside some time for dara QA that we should do before turning on the feature to everyone in these languages.

great! @Prtksxna, please ping if you'd like to chat more. If we don't hear back by Thursday, we assume this task is done.

I'll mark this as resolved per earlier comment.

As for the schema itself, @Prtksxna: which of the actions register the action that a hovercard gets opened in the page but it's not clicked on by the user for a new page to open in the same tab, in another tab, etc. This is important data to have if later you want to count one hovercard view equivalent to one pageview.

That would be 3.1 from the description of this issue, that is - dismissed.

Second question, what does the schema log if Hovercard is disabled?

That would be user group A, and the way it would be logged is documented in the second half of the bug's description. Quoting from it-

  • Values (2) and (5) will not be logged.
  • (3) will be logged for the links themselves as there will be no hovercard to show.
  • Consequently, (3.1) will never be logged for A

Analysis
We will sample Wikipedia pages in each of the three languages and will log all Hovercard interactions in those pages in the trial period. We will then compare the number of pageviews in the presence of Hovercards feature with the number of pageviews in the absence of Hovercards, for example, in the month of February, considering the sample pages we log data for in each language. Focusing on specific pages will enable us to do more accurate analysis on the pageview comparisons.

I remember Dario mentioning that analyzing pageviews for language wikis is a bad idea as the numbers aren't very predictable, thus comparing the two time periods might not be the best way to measure the success or failure of Hovercards.

By looking at only a select few sample pages are we able to get rid of that problem?

Schema

  • Add page_id_source: this item will capture the page_id of the page the Hovercard is triggered on. For example, if user is in page X and hovers over link Y, the page_id will be X.
  • Add page_id_hover: this item will capture the page_id corresponding to the link on which Hovercard was triggered on, Y in the above example.
  • Remove sessionId: we don't need this data for comparing pageview counts.

This makes sense. I guess we'd have to remove sessionId anyway if we wanted to track pages as we can't track both simultaneously according to legal.

@Prtksxna, please go ahead and make the changes to the schema once Jared approves it. Please set aside some time for dara QA that we should do before turning on the feature to everyone in these languages.

@Prtksxna, sorry it took some time to get back to you. I assumed this task is done.

I'm trying to figure out what's the fastest way we can finalize this schema and make sure we have good enough data to comment on one question the team is asked to answer: Will HC reduce pageview counts significantly? I think the best way is for you and I to set up a call and nail down the schema in the presence of other constraints and timelines. I'll send you an invite for Monday.

Closed as per our discussion on call.
We'll go ahead with the suggestions made in T88166#1055167

@Prtksxna note that we need to keep track of the language in the schema. You don't have a language or wiki item for that in revision 11528589.

@Prtksxna note that we need to keep track of the language in the schema. You don't have a language or wiki item for that in revision 11528589.

Isn't the source wiki something event logging logs automatically by default? I got that impression from https://phabricator.wikimedia.org/T91272#1082224

@Nuria, I just want to make sure that the xxwiki ends up in the Popup table in the database, otherwise, the analysis becomes complicated each time we want to run simple queries like "what percentage of those we log for have disabled Hovercards in ctwiki?" With this in mind, should we instrument language as part of the schema or not?

thanks!

@leila, the 'wiki' will always be logged (enwiki, dewiki...) with each event

@Prtksxna sorry for the confusion. I didn't know it gets automatically logged and it will show up in all the tables. Thanks @Nuria for clarifying this.