Page MenuHomePhabricator

Check & possibly improve request id handling in changeprop
Closed, ResolvedPublic

Description

In T151702, we saw a lot of changeprop requests sharing the same request id. All of these requests were triggered by the same edit, but targeted different titles. The shared request id created confusion, with those investigating suspecting retry issues.

Assuming the shared request ID is indeed added / forwarded in ChangeProp, I am proposing to change the behavior so that:

  • Each derived request initiated by ChangeProp gets a fresh request-id assigned. Retries (if any) use the same request id.
  • Additionally, each derived request gets a triggered-by header assigned, which contains a comma separated list of trigger URLs, with each component escaped with encodeURIComponent. This list is appended to the same way https://en.wikipedia.org/wiki/X-Forwarded-For is, so that it builds up a complete chain of events / request ids that triggered the actual request.

The result should be:

  • Avoid confusion between requests / retries and triggers.
  • Ability to filter on request id only, seeing only the primary event processing.
  • Ability to filter on triggered-by only, seeing only derived events. Using phrase search, even specific parts of the update graphs can be targeted in queries.

Related Objects

Event Timeline

Hmmm, maybe one needs to get used to it, but I find the feature of the request-id field as-is highly useful, as it allows us to track all of the requests that were generated from the original one. Also, IIRC, this is precisely the definition we chose for the field: a request ID gets assigned when an external action/request comes in, and then it is used to trace all of the subsequent requests that have triggered as a result.

As such, the primary request id is generated by MW.

The distinction is between the primary request processing itself, and async updates. This proposal does not hinder the ability to list all derived async requests using the primary request id as it would be included in triggered-by, but it avoids suggesting that all (potentially millions) of derived async requests are indeed the same request.

@GWicke The triggered-by includes the topic and primary event URI, not the primary request ID, so we won't be able to filter on it because there might be many requests with the same triggered-by value. So if we change the reqId logic we will need to change the triggered-by too.

@Pchelolo, the proposal in the task description is to treat triggered-by as a list of URIs, which would allow us to include the request id as well.

@GWicke Hm, how would the URI contain the request-id? Right now we use the URI of the resource, like https://en.wikipedia.org/wiki/Main_Page or https://en.wikipedia.org/api/rest_v1/page/summary/Main_Page. That looks pretty logical to me, since the semantics is "change of this resource provoked a change in that other resource". Where would you put the request ID in this URI?

I think that having a shared reqId is better even for tracking retry issues. You just search for the problematic reqId in the logs, and if there's more then 3 hitting the same URI with the same reqId - you have a retry problem. I'm not sure how this could create any confusion.

@Pchelolo, the proposal is to make triggered-by a *list* of URLs, so we can have something like req:<reqid>, https://en.wikipedia.org/wiki/Main_Page. This would match the entry when searching for the request id in any field, but also makes it possible to filter for the primary request / sub-requests separately.

Another option for incorporating the original request id in the first triggered-by entry: https://en.wikipedia.org/wiki/Main_Page#req:<reqid>

Ok, what can we do here is to prepend the current x-triggered-by header with UUID of the original event and then leave all the rest unchanged. The requestId and event id will not be reused any more. Sounds good?

Ok, what can we do here is to prepend the current x-triggered-by header with UUID of the original event and then leave all the rest unchanged. The requestId and event id will not be reused any more. Sounds good?

Sounds great to me. Logically, pre-pending makes the most sense as well. The request came in, and then we looked at the URL, which produced the edit event. To make this a URL, lets use the req: or reqid: pseudo protocol?

Bumping priority, as this is almost done.

mobrovac edited projects, added Services (done); removed Services (next).

Merged and deployed, resolving.