Page MenuHomePhabricator

Analytics schema for edit action feed
Closed, ResolvedPublic

Description

Android is working on a feature loosely called the "Edit action feed" to be released end Q3. We want to collect quantitative data on the success of the feature, which calls for some fun schemas.

We care about:

Usage

  • Measure the length of sessions and volume of edits made within the different app editing tasks. (We want to know how many cards deep into the feed people look vs. how many edits they actually make, to see whether the stuff we're presenting is compelling to people. Also interested if people do this for long at a time to see how compelling they find the stuff.)

Quality

  • the rate of reverts from edits made using the task list UI is low (meaning: at or below the overall revert rate for the same types of edits)

Increase in editing activity by Android app users

  • Increase in Android app editor 2nd month retention
  • Increase of Android app editors editing outside of the app

Design spec sent to @mpopov via email.

Event Timeline

@mpopov - Ping. We're trying to go out by the end of March so a spec would be helpful. :)

@Charlotte: thanks for the ping! I'm working on acquiring data for T213458 and it turned out to be much, much harder than I anticipated.

I will say that based on my past analysis, the feed/card analytics was broken and would need to be overhauled. I'm going to meet with Dmitry in the next couple of weeks to go over that respec + this.

Thanks. As long as we are sure we can get measurement in place + tested working by release at the end of March this is fine.

Usage

Measure the length of sessions and volume of edits made within the different app editing tasks. (We want to know how many cards deep into the feed people look vs. how many edits they actually make, to see whether the stuff we're presenting is compelling to people. Also interested if people do this for long at a time to see how compelling they find the stuff.)

We will have two schemas: one for tracking interactions with the intro cards as they are presented to the user (e.g. when user unlocks the suggested edits feature and they can tap "maybe later" or "get started") and one for summarizing their sessions within the Editor tasks screen.

Spec draft for MobileWikiAppSuggestedEdits, which provides a summary of each session within the Editor tasks screen:

  • session_token (should this go by another name for technical reasons?)
    • This is the token which was generated for the overall session in MobileWikiAppSessions, NOT a token generated each time user goes into the Editor tasks screen
    • Enables us to link multiple usages of Suggested Edits together and then also with the summary of the session
    • That is, each time the user exits out of the Editor tasks screen, an event is sent summarizing that "sub-session", but as long as the user is within the same overall session, all of those summary events will have the same token
  • time_spent
    • How much time was spent in the feature in this particular instance
    • Expected value is 0 or 1 if user enters Editor tasks screen and then immediately exits back to the feed
    • Since user can go into reading mode when adding description to the article (there's now an option to read the article), we should pause the timer and only resume it when the user hits the back button and returns to editor task screen
    • BUT if they hit the <W button to go back to the feed then that's it and that's when the summary event is sent
    • Notes
      • Summing time_spent across all summary events within the same app session is how much time the user spent with the feature
      • If total_time_spent is the sum of time_spent across all summary events within the same app session, then total_time_spent / length (as in the length field from MobileWikiAppSessions) is the proportion of the overall session that the user spent using the Suggested Edits feature
  • help_opened
    • Counter indicating how many times the user looked at Help in this particular instance
    • Any time counts, whether it's Help from the Editor tasks screen or within task or while making a suggested edit
  • scorecard_opened
    • Counter indicating how many times the user looked at their contribution scorecard in this particular instance
  • from_onboarding (may need to change name for clarity?)
    • Boolean flag indicating whether the user entered the Editor tasks screen from an onboarding card
    • Will be false for majority of events as most of the time the user will enter the Editor tasks screen from the menu
    • This is a separate field so it can be excluded from EL whitelisting as we'll want to hash the cross-schema session_token & app_install_id and keep edit_tasks. user_id field will need to be purged after 90 days so it can't be used to link events across quarters, per AE guidelines
  • edit_tasks is the main course here
    • JSON array of per-language, per-task stats if the user looked at 1 or more suggestions
    • null if the user entered the screen and then backed out; which is to say no suggestions were looked at or the empty set pattern (described below) if easier (probably easier), e.g. : {"en":{}, "ru":{}}
    • Each language is an element in the array, even if user did not even look at suggestions in that language
      • If I have English and Russian in my language settings but I only interacted with English suggestions to add descriptions, this would look like: { "en":{ "add-description":{ "imps":10, "clicks":5, "cancels":2, "successes":3, "failures":0 } }, "ru":{} }
    • For translations, we would have from>to as the name: { "en>ru": { "edit-description":{…stats…}, "edit-caption":{…stats…} }}
    • Each task will have the following data:
      • impressions is the total number of suggestions
      • clicks is the total number of times the user accepted a suggestion
      • cancels is the total number of times the user accepted a suggestion but then changed their mind
      • successes is the total number of successful publications
      • failures is the total number of failed publications (e.g. connection timed out or edit didn't go through for whatever reason); note: user can either keep retrying or they go back, in which case that's counted as a cancel AND a failure
    • Notes
      • impressions - clicks is the total number of skips
      • clicks/impressions is the suggestion acceptance rate and (impressions - clicks)/impressions is the suggestion rejection rate (aka skip rate)
      • cancels + successes = clicks, since each accepted suggestion can only end in two possible ways (cancel or publish)
      • for translation task, the stats are for the language they're translating to, not the language they're translating from
      • Summing successes across all task types within language is the number of contributions made in that language
      • Summing successes across all task types and all languages is the number of contributions the user made in 1 usage of Suggested Edits feature
      • Summing successes across all tasks, all languages, and all summary events is the number of contributions the user made in their overall session of using the app
  • suggested_edits_version
    • This is to track which version of the feature the user engaged with
    • Initially 1.0 but then we can try variations of the interface (e.g. 2a vs 2b)
  • suggestions_api_version
    • This is to track which version of RI's backend API the feature utilized
    • If the backend is changed (e.g. suggestions are made using a personalized recommender system / machine learning model predictions), we want to see how that changes user engagement with the feature
    • Or if we play with unlocking thresholds (done on server-side, not client-side as I understand) and see how they affect contributions/usage of the feature (e.g. "1.0-relaxed" vs "1.0-stricter")
  • app_install_id and client_dt, as standard

Example edit_tasks of a fruitful session with Suggested Edits feature:

{
  "en" : {
    "edit-description" : {
      "imps" : 5,
      "clicks" : 3,
      "cancels" : 1,
      "successes" : 2,
      "failures" : 0
    },
    "edit-img-caption" : {
      "imps" : 5,
      "clicks" : 3,
      "cancels" : 1,
      "successes" : 2,
      "failures" : 0
    }
  },
  "ru" : {
    "edit-description" : {
      "imps" : 5,
      "clicks" : 3,
      "cancels" : 1,
      "successes" : 2,
      "failures" : 0
    },
    "edit-img-caption" : {
      "imps" : 5,
      "clicks" : 3,
      "cancels" : 1,
      "successes" : 2,
      "failures" : 0
    }
  },
  "en>ru" : {
    "edit-description" : {
      "imps" : 5,
      "clicks" : 3,
      "cancels" : 1,
      "successes" : 2,
      "failures" : 0
    },
    "edit-img-caption" : {
      "imps" : 5,
      "clicks" : 3,
      "cancels" : 1,
      "successes" : 2,
      "failures" : 0
    }
  },
  "ru>en" : {
    "edit-description" : {
      "imps" : 5,
      "clicks" : 3,
      "cancels" : 1,
      "successes" : 2,
      "failures" : 0
    },
    "edit-img-caption" : {
      "imps" : 5,
      "clicks" : 3,
      "cancels" : 1,
      "successes" : 2,
      "failures" : 0
    }
  }
}

or the minnified version:

{"en":{"edit-description":{"imps":5,"clicks":3,"cancels":1,"successes":2,"failures":0},"edit-img-caption":{"imps":5,"clicks":3,"cancels":1,"successes":2,"failures":0}},"ru":{"edit-description":{"imps":5,"clicks":3,"cancels":1,"successes":2,"failures":0},"edit-img-caption":{"imps":5,"clicks":3,"cancels":1,"successes":2,"failures":0}},"en>ru":{"edit-description":{"imps":5,"clicks":3,"cancels":1,"successes":2,"failures":0},"edit-img-caption":{"imps":5,"clicks":3,"cancels":1,"successes":2,"failures":0}},"ru>en":{"edit-description":{"imps":5,"clicks":3,"cancels":1,"successes":2,"failures":0},"edit-img-caption":{"imps":5,"clicks":3,"cancels":1,"successes":2,"failures":0}}}

which is 676 characters which is a rather big payload BUT this is someone with 2 languages who has then engaged with every aspect of the feature (in the future, where image captions are possible) in ONE session:

  • added descriptions AND image captions for both en AND ru
  • translated descriptions AND image captions to AND from en and ru (and vice-versa)

So we should be fine for 100% or nearly 100% of the cases :D

Spec draft for MobileWikiAppSuggestedEditsUnlocks, which tracks interactions with the onboarding(?) cards as they are shown to the user after fulfilling criteria:

(the whitelisting notes are for me)

  • action: "maybe later", "get started"
    • not whitelisted to keep
  • task_type (e.g. "add-description")
    • whitelisted to keep
  • unlocked_previously
    • Boolean flag indicating whether the task has just been unlocked because the user has already fulfilled the criterion prior to launching a newly installed copy (true, e.g. if they're reinstalling the app or launched the app on a different device) or if the task is unlocked "naturally" upon fulfilling the criterion (false)
    • Note: unfortunately the backend won't check for fulfillment of criteria of the user (e.g. an experienced editor who has made many description edits and is using the app for the first time) and only keeps a count of how many description edits the user made in the app once the API launches
    • whitelisted to keep
  • user_id
    • JSON array of user ID(s), initially of just the user's ID on Wikidata; e.g. {"wikidata":XXXX}
    • eventually Commons (when image captions are added as task); e.g. { "wikidata": XXXX, "commons": YYYYY}
    • whitelisted to keep
  • suggested_edits_version (see above)
  • suggestions_api_version (see above)
  • session_token (overall, see above)
    • NOT whitelisted to keep/hash
  • app_install_id as standard
    • NOT whitelisted to keep/hash
  • client_dt as standard
    • whitelisted to keep/hash

By keeping the user_id and not keeping or hashing the cross-schema identifiers session_token and app_install_id, we will have an archive from the start of the feature of how many Android app editors have unlocked which tasks and whether the unlock was retroactive or "natural".

Quality

the rate of reverts from edits made using the task list UI is low (meaning: at or below the overall revert rate for the same types of edits)

When we calculate the revert rate, we will need to be able to differentiate revisions made in Edit Tasks from revisions made in app but outside of that feature, so any description addition/translation that is made as part of Suggested Edits should be marked as such.

Inspired by the QuickStatements tool – which marks revisions with #quickstatements – we can use the summary field (Dmitry told me we currently don't, but can easily start) of the [[ https://www.wikidata.org/w/api.php?action=help&modules=wbsetdescription | wbsetdescription API ]] calls to mark description edits made with the Suggested Edits feature with Added from Suggested Edits (v1.0) in Wikipedia Android app. This will then show up in the revision comment. Note: edit descriptions made by the app are already tagged as "mobile app edit" and "android app edit" separately. Note for future reference: same when image captions (and image caption translations) are added as a task type; those revisions will also need to have the same comment added.

Furthermore, since the user can start reading an article when they're adding a description, MobileWikiAppSessions should have a fromSuggestedEdits field to count how many pages the user started reading from this feature.

Metric goals

  • Increase in editing activity by Android app users and (from the design brief) increase Android app editor retention
    • Using the user_id information in the two Suggest Edit schemas as well as the additional "suggested app edit" tag, we will be able to compare:
      • in-app editing activity of users who use the feature with those who do not
      • retention rates of new editors who get exposed to and the feature vs those who do not
  • Increase of Android app editors editing outside of the app
    • Using the user_id information in the two Suggest Edit schemas, we will be able to assess how usage of the Suggested Edits feature affects editing activity outside of the app

Because the feature must be unlocked by the user after fulfilling the first criterion, I suspect a small proportion of editors will be exposed to the feature. Therefore it makes sense to look at unlock rates and then compare editing activities of editors by exposure/usage of the feature; rather than gauge whether the overall metric moved following the release of the feature. That is, if the feature is a great success with 10% of users, it's unlikely we would see significant impact of metrics which are calculated from all Android app editors.

After talking with Chelsy and thinking even more about this over the weekend I'm gonna simplify the specs. Once Robin returns I'll talk with him about data he'd be interested in with respect to user experience flow to aid future redesigns/adjustments.

Sounds great @mpopov, let’s schedule a 1:1 call later in the week to talk about it?