Page MenuHomePhabricator

Core REST API logs calls
Open, Needs TriagePublic

Description

Core REST API should log requests, for debugging and analysis.

Action API does this via ApiMain::logRequest. See https://gerrit.wikimedia.org/g/mediawiki/core/+/9d456b429d8b5265f1e134c9eca46f99318ee083/includes/api/ApiMain.php#1626
Calls are logged to api.log (unstructured message) and api-request.log (structured message). Sensitive information (such as csrf tokens) is redacted from the log (by being replaced with the string "[redacted]".

For Core REST API, we will need to:

  • decide to what file(s) calls be logged
  • if this involves new files, determine what, if anything needs to be done for these files to exist in our production infrastructure and be properly mapped to logstash/kibana
  • implement logging
  • ensure that sensitive information is redacted

Related Objects

Event Timeline

Note that for Action API these logs also are converted into kafka events and fed into analytics infrastructure for researchers to analyze.

Also it's not delivered to Logstash due to the amount of log it produces.

If we imagine REST API having the request rates similar to Action API rates eventually, we have to design for supporting the same capability right away. The logs going to Kafka for Action API conform to the schema. If we could reuse the same schema for REST API - great. If not - it's ok too, but it's probably better to have it close to action_api schema.

@Pchelolo From a product standpoint, there are a few things that I'd want from the logging that aren't in that schema:

  • Matching route. It'd be good to know how many hits we get on page history, say, without having to do page title regexes.
  • Authentication method. Cookies, OAuth 1.0, OAuth 2.0, something else.
  • Client ID. If available. See T251812.
  • HTTP status code. I don't see it in the schema; especially for REST APIs, there's a lot that can go on in here.

cc @Ottomata ^^ it seems like all these are simple enough to get into our api-request schema.

FYI, mediawiki/event-schemas repo is deperecated. mediawiki/api/request now lives in schemas/event/primary.

it seems like all these are simple enough to get into our api-request schema.

+1

For docs on how to modify a schema:
https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas#Modifying_schemas

For the http specific fields, we should probably modify the http subschema fragment.

We do have status code there, its just that mediawiki/api/request doesn't directly $ref the http fragment schema, only because it was made before the http fragment schema existed. We should switch to using it and add the other needed http fields.

Will we want the core REST events to belong to a different stream? The current action API stream is mediawiki.api-request, which ends up as the mediawiki_api_request table in Hive.