Page MenuHomePhabricator

Better publishing of Annotations about Data Issues
Open, NormalPublic

Description

While experiencing issues like T141506, we realized there's an opportunity to publish annotations that explain spikes and drops in different metrics we're responsible for. Some possible solutions:

  • publishing these annotations in an API that would parallel the pageviews, uniques, etc. api endpoints.
  • just updating the dashiki annotations (eg. https://meta.wikimedia.org/wiki/Dashiki:PageviewsAnnotations) and making it easier to fetch them / discover them from other tools (like the tools.wmflabs.org/pageviews tool)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 8 2016, 5:02 PM
MusikAnimal added a subscriber: MusikAnimal.

@MusikAnimal: happy to work with you on this if you'd like, to understand what would be easier from your point of view.

@MusikAnimal: happy to work with you on this if you'd like, to understand what would be easier from your point of view.

Certainly! Thank you :) If we could get the annotations in a way that mimics the pageviews API endpoints, as you say, that'd be most ideal. E.g. ideally we'd only see info on the Main_Page anomaly if querying for desktop pageviews of the Main Page itself or for the entire project.

Also, what about situations where annotations are provided, but at a later point the data is repaired? For instance when T128295 happened I manually put in a notice in the tool if the user queried for an affected page during the affected date range, but eventually the data was repaired so I was able to remove this notice altogether. Similarly my hope is we can either remove such annotations from the API or mark them as "repaired", so the Pageviews tool knows not to show it.

T128295 is actually an interesting scenario for an annotations API. There the message should only be returned for pages with some special characters in the titles. It seems that as the API grew, you may need to add more and more layers of logic to determine what to return. If we are able to do this that would be incredible... The goal of course is to only show annotations when we are confident they are relevant to the data the user has requested.

Nuria moved this task from Incoming to Backlog (Later) on the Analytics board.Aug 15 2016, 3:36 PM
Milimetric added a comment.EditedAug 17 2016, 8:32 PM

Thanks, these are all great points. We're aiming to get to this next quarter.

But we don't need to make our logic too complicated. We can rely on the user a little bit to decide whether to take the annotation into account. So instead of dynamically toggling it based on the accents in page titles, we could just say "page titles with accents had a problem here". And leave more general things, like the timespan and whether or not the problem was fixed, as part of the annotation metadata.

Nuria triaged this task as Normal priority.Mar 20 2017, 4:23 PM
Nuria moved this task from Wikistats Production to Dashiki on the Analytics board.Apr 24 2017, 4:25 PM
Nuria added a subscriber: Nuria.Apr 24 2017, 4:29 PM

We feel that wiki annotations per metric that are machine readable would work for this use case . Also, we can probably make use of a generic page with annotations that affect all metrics.

mforns added a subscriber: mforns.Apr 24 2017, 6:29 PM

I agree, the on-wiki JSON config pages seem like a good tool for storing annotations.

  • they can be edited easily without technical knowledge
  • don't need deployment, changes are immediate :]
  • what @Nuria said: human-readable and machine-readable

The problem I see now is that every metric/graph has its own annotations page. It's difficult to find all those pages, and if an annotation concerns more than one metric (totally happens), then there will be duplication of code.

Maybe we could:

  • Move all annotations to a dedicated namespace, like: Config:Annotations: so they would be easier to find and list.
  • Reuse the same annotations page for all metrics that share data sources. For example: Config:Annotations:Webrequest for all metrics that use webrequests as data source.
  • If still there are annotations that concern only one of the various metrics that share a given data source, we could have a tree-like annotation structure:
Config:Annotations:Webrequest
    |___Config:Annotations:Pageviews
    |       |___Config:Annotations:PortalPageviews
    |___Config:Annotations:UniqueDevices

For that last proposal, we'd need to modify mediawiki-storage lib (the one that reads that config files in Dashiki) to accept some kind of hierarchical structure. I created a task for that: T163725

Just a warning: Config:Dashiki: is what the dashiki extension configures, so I think it's best to prefix with that because I really don't want to make another extension. We could theoretically configure both Config:Dashiki: and Config:Annotations: in the same extension but I feel like people would oppose that for being gross.

Nuria added a comment.Apr 27 2017, 2:57 PM

+1 to Config:Dashiki, we do not need another namespace.

Nuria moved this task from Dashiki to Backlog (Later) on the Analytics board.May 16 2017, 12:52 PM