Page MenuHomePhabricator

Load API request count and latency data from Hadoop to a dashboard
Open, HighPublic

Description

Action API traffic data (counts, user agents, errors, backend latency) are collected in the ApiAction tables in Hadoop. Currently the only way to use them is by logging in to the stats box and manually running Hive queries, which is not too useful for product management. We should expose them somehow.

This is probably although not necessarily blocked on T137321: Run ETL for wmf_raw.ActionApi into wmf.action_* aggregate tables (making the data collection more production-like).

Related Objects

StatusSubtypeAssignedTask
ResolvedQgil
ResolvedKeegan
DeclinedNone
Resolved Deskana
ResolvedAnomie
ResolvedQgil
ResolvedQgil
InvalidNone
InvalidNone
ResolvedNone
DeclinedQgil
ResolvedQgil
OpenNone
ResolvedAnomie
OpenNone
ResolvedTgr
ResolvedAnomie
OpenFeatureNone
OpenNone
ResolvedTgr
ResolvedNone
Resolvedbd808
OpenNone
ResolvedArielGlenn

Event Timeline

Tgr claimed this task.
Tgr raised the priority of this task from to Needs Triage.
Tgr updated the task description. (Show Details)
Tgr added subscribers: Tgr, bd808.

Let us know when you figure out the metric / get it measured and we can help you make a dashboard.

mforns renamed this task from Load API request count and latency data from Hadoop to a dashboard (limn?) to Load API request count and latency data from Hadoop to a dashboard.May 30 2016, 4:41 PM
mforns subscribed.

@Tgr
This can be easily done with reportupdater and it will show on a dashiki instance.
We can help you with that.

Jhernandez subscribed.

@Tgr Can you add a full description about what this is and move to backlog if it is ours? Thanks

@Jhernandez, added some description. This originally came about when Developer Relations was planning a pivot towards external developers (ie. people who use Wikimedia APIs for mashups but don't use Wikimedia code directly) and was interested in API usage / usability data (hence, T102079: Metrics about the use of the Wikimedia web APIs). The pivot eventually did not happen; exposing API usage data still seems like the sensible thing to do, but I guess today the potentially interested party would be @EvanProdromou, as API PM? Also, back then Reading Infrastructure was the team closest to owning the API so the ApiAction work was done by us. I have no idea how responsibilities are split today within the teams participating in the Better Use of Data CDP.

Anomie subscribed.

Also, back then Reading Infrastructure was the team closest to owning the API so the ApiAction work was done by us. I have no idea how responsibilities are split today within the teams participating in the Better Use of Data CDP.

As far as I can tell, the Action API "ownership" went with me when I moved to the MediaWiki Platform team, and then that team became part of the Core Platform Team. Just like it came with me from MediaWiki Core to Reading Infrastructure (with a brief stop in the Wikimedia MediaWiki API Team) during the Reorg of Doom.

Evan is part of CPT too, and will presumably take over some of the Product Manager aspects of that ownership eventually.

On the other hand, this particular task is more "about" the API than actually within the scope of MediaWiki-Action-API. I don't know who might own WMF-specific dashboards that are done outside of MediaWiki. I don't know anything about the "Better Use of Data CDP".

That's Better use of data. I guess @kzimmerman would be the other person who might be able to help prioritize this and decide on ownership.

This is relevant to recent discussions about tracking content consumption, but Product Analytics hasn't dug into API use (yet).

Who are the key stakeholders associated with this task?

Jhernandez raised the priority of this task from Low to Needs Triage.Feb 28 2019, 10:53 AM

Thanks for clarifying @Tgr @Anomie. I've moved it to tracking for reading infrastructure and reset the priority, it seems clear that it we shouldn't currently own this, and the appropriate followup teams have been pinged into the task.

Aklapper removed Tgr as the assignee of this task.Jun 19 2020, 4:16 PM

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

kzimmerman added a subscriber: sdkim.

@sdkim this is in Tracking for BUOD; is this still relevant for your team?

Also, with the addition of SQL Lab & Presto to Superset (https://superset.wikimedia.org/superset/sqllab) it's possible to make a dashboard based on that table directly, although it appears there is no longer data being collected in it

Screen Shot 2021-02-08 at 2.16.31 PM.png (1×1 px, 207 KB)

Also, with the addition of SQL Lab & Presto to Superset (https://superset.wikimedia.org/superset/sqllab) it's possible to make a dashboard based on that table directly, although it appears there is no longer data being collected in it

Screen Shot 2021-02-08 at 2.16.31 PM.png (1×1 px, 207 KB)

Given there is no data being collected and this task is 2 years stale I'd recommend closing. @AMooney

Given there is no data being collected and this task is 2 years stale I'd recommend closing. @AMooney

The data does exist in Hadoop, but needs T137321: Run ETL for wmf_raw.ActionApi into wmf.action_* aggregate tables to fixed to make the aggregate tables again.

LGoto triaged this task as High priority.Mar 24 2021, 6:37 PM