Our stack comprises many components/servers that all interact with each other in other to fulfil clients' requests (or prepare data to be served to clients at a later point). For any given request, requests may be spawned to other components and their responses assembled before being returned to the client. This creates the need for having a sort of a distributed stack trace that allows us to pin-point problematic links in the request chain.
A certain degree of request identification does currently exist in our infrastructure, alas only on sub-system levels:
- MediaWiki's WebRequest relies on the UNIQUE_ID env variable provided by Apache's mod_unique_id
- RESTBase and the services behind it use and propagate the X-Request-Id header
- EventBus relies on the same x-request-id header when creating events for both asynchronous updates as well as JobQueue messages
- Thumbor uses a custom Thumbor-Request-Id header
There are probably more such examples.
In order to be able to trace the requests provoked by an (initial/external) request, all of the systems in our infrastructure should identify requests in the same way, use this identifier for logging and propagate it to other links in the request chain.
Use a UUID v1/v4 x-request-id header/entity. Varnish f-e (soon ATS) is the main point of entry of external requests. Therefore, it can generate the request IDs and attach them to requests in the form of the x-request-id header, which can then be used and propagated by all entities behind it. Furthermore, entities responding to requests must log the received/generated request ID.
- T89562: RESTBase should set Request-ID and perhaps X-Forwarded-For headers for external requests
- T97226: Include the request ID in API request logs
- T97207: Forward X-Request-ID header in outgoing requests
- T117021: Request ID for debug log
- T113817: Connect Hadoop records of the same request coming via different channels
- T200594: Add client identifier to requests sent from Kartotherian to WDQS
- T193050: Include request id (if present) in a comment in DB queries
- T147101: Uniform performance insight for different services (tracking)