Top API user agents stats
Open, NormalPublic

Description

We would like to get a better understanding of the top API user agents, for both the REST & Action APIs. These statistics should cover both cache misses & hits.

We do have user agents in web request logs, but as far as I know we currently only expose UA stats aggregated across all requests. Would it be possible to set up separate UA stats for requests matching /w/api.php and /api/rest_v1/ (separately + combined)?

See also: T122245: REST API entry point web request statistics at the Varnish level

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 4 2016, 8:57 PM
GWicke renamed this task from Top API user agents stat to Top API user agents stats.Aug 4 2016, 8:57 PM
Nuria added a comment.Aug 4 2016, 9:07 PM

That information exists for the php api in the api tables : https://wikitech.wikimedia.org/wiki/Analytics/Data/ApiAction

Data from api in webrequest is partial and thus this data is published by the api itself.

hive (wmf_raw)> desc apiaction;
OK
col_name data_type comment
ts int
ip string
useragent string
wiki string
timespentbackend int
haderror boolean
errorcodes array<string>
params map<string,string>
year string
month string
day string
hour string

  1. Partition Information
  2. col_name data_type comment

year string
month string
day string
hour string

@Nuria, we are interested in all requests, including cache hits. Requests recorded by backends like the PHP API would not include those.

Nuria added a comment.Aug 4 2016, 9:14 PM

We do have user agents in web request logs, but as far as I know we currently only expose UA stats aggregated across all requests.

A small clarification: we expose UAS aggregated across pageviews, rather than requests, meaning that a pageview that involves 1 html fetch and 20 javascript fetches is counted as "1 pageview" and thus reports "1 UA", otherwise our UA reporting will be over representing the browsers with js support.

Anomie moved this task from Unsorted to Non-Code on the MediaWiki-API board.Aug 4 2016, 9:38 PM
Milimetric moved this task from Incoming to Backlog (Later) on the Analytics board.Aug 8 2016, 4:47 PM
Milimetric triaged this task as Normal priority.
Tgr added a subscriber: Tgr.Jan 9 2017, 5:50 AM

ApiAction is collected via varnishkafka and does include cached requests. Stats aggregation is T137321.

Nuria added a comment.Jun 15 2017, 4:09 PM

Ping @Tgr what is the status of this?

Tgr added a comment.Jul 4 2017, 6:00 PM

ApiAction is collected via varnishkafka and does include cached requests. Stats aggregation is T137321.

Actually, ApiAction is collected via the PSR-3 logger in MediaWiki, which sends Avro objects to Kafka. So it does not include cached requests (but does include requests which are not logged by Kafka, such as POST request). Sorry for the misinformation.

Ping @Tgr what is the status of this?

Action API requests which hit a backend server are logged (in a way that enables top UA stats), but the logging code is hacky and needs to be finalized (T137321: Run ETL for wmf_raw.ActionApi into wmf.action_* aggregate tables). Logging cached API requests via varnishkafka and joining them with the existing stats is T155478: Copy cached API requests from raw webrequests table to ApiAction; as you said there, it probably should be done on top of webrequest tagging.

I don't know much about RESTBase stats. For ORES (not mentioned in the task but would be the next logical thing to add) we don't collect UA data at all AFAIK.

fdans assigned this task to Nuria.Oct 23 2017, 4:05 PM