Page MenuHomePhabricator

upstream request timeout, http-status 504 in the API
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • wget -O - "https://de.wikipedia.org/w/api.php?format=json&formatversion=2&action=query&list=search&srsearch=insource%3Ahttps%20insource%3A%2F%5C%5Bhttps%3A%5C%2F%5C%2F%5B%5E%20%5C%5D%5D%2A%27%27%2F&srnamespace=0&srlimit=max&srinfo=&srprop="

What happens?:
--2025-11-13 07:39:49-- https://de.wikipedia.org/w/api.php?format=json&formatversion=2&action=query&list=search&srsearch=insource%3Ahttps%20insource%3A%2F%5C%5Bhttps%3A%5C%2F%5C%2F%5B%5E%20%5C%5D%5D%2A%27%27%2F&srnamespace=0&srlimit=max&srinfo=&srprop=
Resolving de.wikipedia.org (de.wikipedia.org)... 2620:0:861:ed1a::1, 208.80.154.224
Connecting to de.wikipedia.org (de.wikipedia.org)|2620:0:861:ed1a::1|:443... connected.
HTTP request sent, awaiting response... 504 Gateway Timeout
Retrying.

What should have happened instead?:
Retrieve some data

This happens since the 11th of November, did not see the problem before. And it happens with various different search patterns from my home machine (not logged in) and from within the toolforge cloud when logged in with my bot account.

In my PHP-Code I get this answer: {"httpReason":"upstream request timeout","httpCode":504} in addition to status 504

Event Timeline

Aklapper renamed this task from upstream request timeout, http-status 504 in the API to upstream request timeout, http-status 504 in the API.Nov 13 2025, 9:44 AM
Aklapper updated the task description. (Show Details)

I have replicated this for the wget and the API Sandbox German. API Sandbox English returns the data. @Wurgl Is Sandbox English broken as well from your end..?

Atieno triaged this task as Medium priority.Nov 14 2025, 10:17 AM
Atieno moved this task from Incoming (Needs Triage) to Bugs & Chores on the MW-Interfaces-Team board.

Yesterday, the english sandbox returned an error. Today it works.

The error on deWP appears always after 15 seconds. Maybe on enWP this time limit is set to a larger number?

doctaxon raised the priority of this task from Medium to High.Nov 15 2025, 11:22 PM

Sometimes it's an 503 error instead of 504 error.

One of my more important bot scripts is working with the API Search query and is still needed for important functions in dewiki every hour with any complaints if it's not working. That's why you should allow me to raise the priority to High.

Thank you

Let me know how to point out something important to someone or someone's board other ways than priorities ... I don't know about or am missing it on Phabricator

It smells, that the problem is not the search logic, since in https://de.wikipedia.org/wiki/Special:search you can search for insource:https insource:/\[https:\/\/[^ \]]*/ and there are results. The search take ~24 seconds. The same search in the API-Sandbox or with wget fails always after 15,x seconds.

So the reason for this 504 status seems to be in some frontend code which is used in the API but not in Special:search.

@dcausse : it looks like it's something with these merges: https://gerrit.wikimedia.org/r/q/project:mediawiki/extensions/CirrusSearch

It looks like you have insight into the code changes. Can you help with this bug here?

Indeed, the internal timeout should be 50s to allow the regex to run. It is possible that something changed in the request flow that there is now a component failing earlier than the allowed 50s.

... there is now a component failing earlier than the allowed 50s.

How we can find out the component failing?

... there is now a component failing earlier than the allowed 50s.

How we can find out the component failing?

we're looking into it, we'll get some explanation and a fix hopefully soon.

Change #1206205 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] rest-gateway: set 53s timeout for action API

https://gerrit.wikimedia.org/r/1206205

Change #1206205 merged by jenkins-bot:

[operations/deployment-charts@master] rest-gateway: set 53s timeout for action API

https://gerrit.wikimedia.org/r/1206205

This issue was happening as a result of the migration of the action API to a common gateway within WMF infrastructure (work ticket: T408223, higher level reasoning/tracking: T406607). We're currently undergoing a slow rollout of wikis by group with the exception of enwiki, which means that all wikis are currently behind the gateway, along with 10% of requests for enwiki. The gateway by default itself imposes a timeout of 15 seconds, which was causing the issue seen here. We've since raised the timeout and the queries in this ticket are now succeeding. Apologies for the disruption.

dcausse assigned this task to hnowlan.

This should be fixed, I can see the partial search response instead of the error.