Page MenuHomePhabricator

Create EventLogging schema to store quick surveys responses.
Closed, ResolvedPublic2 Estimated Story Points

Description

We need an event logging schema ready for sending the responses of the quick surveys.

Initial version of fields needed:

So "platform" will be a mandatory enum value with "web", "androidapp", and "iosapp" as options.

"presentation" is an optional string. In the description of this field, maybe try this:

For web, specify the skin name. For apps, specify the form factor and the stage (i.e., {tablet|phone|wearable}-{stable|beta|alpha|prototype}) such as "tablet-alpha".

  • String: survey code name
  • String: survey response value (e.g., answer the user selected, using the i18n key (not the localized value))
  • String: platform (clients would specify "web" or "app")
  • String: presentation (clients would specify: for the web, the skin name; for apps the form factor and the stage (i.e., {tablet|phone|wearable}-{stable|beta|alpha|prototype}) such as "tablet-alpha"))
  • Boolean: whether the user was logged in.
  • Enum: editCountBucket ( "0 edits", "1-4 edits", "5-99 edits", "100-999 edits", "1000+ edits")
  • String: Country code, if known (n.b., this is available from the GeoIP cookie's first field in its colon separated list). "Unknown" if unknown.

Iterate and ping people to get feedback and try to get it as complete as we can so that we don't have to change it once deployed.

Event Timeline

(also asked in T107592 )
Will there be a dismissal button for those who decide they don't want to participate after seeing the question? If yes, should the schema account for that action?

AFAICT this schema is meant to track a user's response to a survey. If we wanted to track a user's engagement with a survey – which we do, right? – then that should be tracked with a different schema, e.g. Schema:QuickSurveysEngagement.

@Tbayer We'll consider adding a dismissal cross at the top right or a
default dismissal answer, we'll consult with design.

In any case having a "n/a" or a "No opinion" answer on survey would also
work for dismissing for the moment.

Since surveys are supposed to be anonymous, I think we can only track dismissal on the client side.
Also, "No opinion" dismisses the current survey. What about the future surveys? What if the user doesn't want surveys at all?

@phuedx For the moment we're not tracking engagement explicitly since knowing the page views & the bucket size & number of responses should give us a good number of engagement.

If the need to track really closely how many times they were "actually" seen then we'll address that in the future, we have to be very careful because depending on the bucket size defined for a survey and the page views that wiki gets it could mean killing event logging real soon.

@bmansurov: What if the user doesn't want surveys at all?

We may have to address this before prod deployment, we'll see.

If the need to track really closely how many times they were "actually" seen then we'll address that in the future, we have to be very careful because depending on the bucket size defined for a survey and the page views that wiki gets it could mean killing event logging real soon.

Wouldn't be the first time @Jhernandez!

@bmansurov, @Jhernandez: I've just tweaked the schema very slightly so that the editCountBucket property isn't required to be submitted when the user isn't logged in.

👍 Sounds good.

What do you guys think about making "platform" and "platformVersion" a free
form string instead of an enumeration? I think @dr0ptp4kt suggested so to
avoid schema changes because of these two fields and to remain flexible,
see T107592#1498926

What do you think?

The [MobileWebSearch schema](https://meta.wikimedia.org/wiki/Schema:MobileWebSearch) uses an enum for the platform property and a free-form string for the platformVersion property. It's conceivable that we might want to vary the latter but we can't really vary the former.

@phuedx should we then add "other" to the platform to *catch all* other not
predicted platforms? Or is that prematurely optimizing?

How painful is it to update the schema afterwards? I've never faced that
situation so I'm not sure how to judge this.

I wonder if the apps team have plans on implementing this feature. I think it's a premature optimization. We shouldn't worry about it too much because schemas are versioned.

A change of number in includes/MobileFrontend.hooks.php and an appropriate change to the associated code (read: as close to trivial a change as we can get).

Ok, then let's move on with this. I'm calling it done unless you guys feel
there's something else to be fleshed out.

Jdlrobson subscribed.

I would like to suggest we reopen this.

If we are going to make desktop/mobile things easier to build in future we should move away from distinguishing between desktop and mobile. We have user agent that allows us to do analysis on mobile specific devices so these are misleading.

I would suggest we replace:
String: platform (clients would specify "desktop", "mobile web", "app" etc)
with [web, app]

And instead of:
String: platform version (clients would specify "alpha", "beta", "stable", "prototype", etc)
Let's use "skin" since this is the only way desktop and mobile site should be thought of as differing.

What are these fields expected to be used for out of interest? Knowing a user story and a driving motivator for this work will educate us better on what needs to be done.

@Jdlrobson, it's nice to precompute these values client side so as to simplify the aggregate analysis for the people running queries. The fields are all different dimensions for group by / rollup type reports so we can gain a sense for user perception depending on those dimensions.

Addition of skin as another field seems like an okay idea for making aggregate analysis more replete. Does that need to block implementation, or would that be something that would make more sense down the road?

@Jdlrobson, it's nice to precompute these values client side so as to simplify the aggregate analysis for the people running queries. The fields are all different dimensions for group by / rollup type reports so we can gain a sense for user perception depending on those dimensions.

I'm confused. What are you trying to compute exactly that my proposed updates wouldn't give you? A concrete example would help.

Suppose Minerva works on the desktop. If it says "web" and "Minerva" I don't know if the user is on a desktop device or a mobile device without having to parse the UA.

Firstly right now you can't use Minerva as a desktop skin as you point out (thus YAGNI):)

Secondly you can't. If you use the mobile site on desktop you are currently incorrectly bucketed as mobile user. We have no way to distinguish the two.

If all you care about is what site they are on then you can use webhost.

@Jdlrobson, the example was meant to conceive of a future where Minerva is tailored for a desktop mode (one step above tablet, which it supports great), presumably running on <lang>.<project>.org but even potentially on <lang>.m|zero.<project>.org or.

This said, I think I've figured out where the hold up is. Although the current schema is sufficient in a sense, the normalization and enums could be defined more clearly. Here's probably what we need in order to ensure it's easy to query yet capture the interesting pieces.

platform: web|iosapp|androidapp
platformVersion: stable|beta|alpha|prototype
formFactor: desktop|tablet|phone|wearable
skin: (optional string)

The platform and, for the most part, formFactor, is generally derivable from the UA, but it's more straightforward to have the client determine this and tell the server.

The platformVersion ("stage" may be a better term) is a pretty simple one.

The skin has meaning in some platforms but not others. But nonetheless, it can play a role in perception and thus in the respondent's answer.

Let's catch up on Monday on video.

Jhernandez raised the priority of this task from Medium to High.Aug 17 2015, 9:16 AM

The concerns and discussion are worthwhile. Moving to TODO until we clear it out.

I've scheduled a meeting with @Jhernandez and @Jdlrobson to go through these items and determine if near term actions are required.

We talked.
We agreed to simplify platform to "web" or "app"
platform version would be replaced with "presentation" and take an arbitrary required string. This would be used on web by sending the skin name. e.g. vector / vector-beta / minerva / minerva-beta

We noted that in future we may want to send the width of the screen to get an idea of screen resolution given that this can impact presentation but we would defer this later.

So "platform" will be a mandatory enum value with "web", "androidapp", and "iosapp" as options.

"presentation" is string. In the description of this field, maybe try this:

For web, specify the skin name. For apps, specify the form factor and the stage (i.e., {tablet|phone|wearable}-{stable|beta|alpha|prototype}) such as "tablet-alpha".

@bmansurov, based on discussion with @Jhernandez and @Jdlrobson, I updated the description. "presentation" should be a string specified by the client.

It makes sense that the response (surveyResponseValue) is recorded in a language-neutral way. But based on experiences with past surveys, one would really like to also record the language in which the survey was taken. E.g. sometimes there turn out to be translation issues that affect the response data in a particular language, or one might be interested in how reader experiences differ across languages.

@Tbayer, would the event capsule's site language fill this need?

@Tbayer, would the event capsule's site language fill this need?

The site language can differ from the user's interface language (https://www.mediawiki.org/wiki/Manual:Language - BTW I hope that that page is up to date regarding mobile); and the question will be displayed in the interface language (correct?). This difference was relevant in the 2011/12 editor surveys, for example. Plus, doesn't the capsule contain the project database name only - "enwiki" etc.? It can be a bit of a pain to extract language names from that (will the survey researcher need to come up with a regex that distinguishes e.g. zh-min-nanwiki from commonswiki?).
Also, going forward, we may want to use quick surveys on multilingual sites like Commons as well.
I realize that our goal right now is to get a MVP into production quickly, and that we won't really need this field if we run a first survey English-only, for example. That said, it's surely preferable not to have to update the schema later. And in any case, if we need to cut such corners now, we should document that so they don't hit us as multilingual bugs later.

And in any case, if we need to cut such corners now, we should document that so they don't hit us as multilingual bugs later.

+1 million

At the very least this should be spun out into a task in the QuickSurveys project but I'd prefer to incorporate past experience now, while we're actively working on the schema definition.

@Tbayer, thanks, and @phuedx, good point. @Jhernandez, would you please review this for prioritization and queue it up as appropriate?

@Tbayer, @phuedx, @dr0ptp4kt, @Jdlrobson I've created T109571 as a follow up, thanks for all the comments! Please chime in that one, we'll keep it on hold for a bit and move it to get it done soon.

I've reflected the changes on platform+presentation and added a field for the user's language.

Thanks a lot, closing this one as done.