Page MenuHomePhabricator

502 /504 Bad Gateway issue on Petscan
Closed, InvalidPublic

Description

See https://bitbucket.org/magnusmanske/petscan/issues/163/petscans-down instead


Working Petscans frequently return a 502 Bad Gateway error.

Examples I use regularly, and which frequently enough return a 502, are:

From the user pov, there doesn't seem to be any rhyme or reason. Sometime the petscan works. Sometimes a 502 is returned.

  1. What is the cause of these 502s?
  2. Can the underlying problem be fixed, please?
  3. If the problem cannot be fixed, can we have a more informative error message - e.g. if there's a query timeout, could we be informed of this?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 31 2019, 5:41 PM
Aklapper closed this task as Invalid.Jan 31 2019, 9:26 PM

Hi, please use thee "Issues" link on top of http://petscan.wmflabs.org/ to report issues as Petscan does not track its issues in Wikimedia Phabricator. Thanks.

Tagishsimon reopened this task as Open.Jan 31 2019, 10:20 PM

So far as I know, the fault is not with Petscan, but with the WMF infrastructure on which it operates. Please do not be so hasty to dismiss problems like this.

Ideally Cloud-Services will liaise with @Magnus either to bottom out & solve the issue; or else to provide an explanation for the 502 issue which we can use to console ourselves each time it happens.

zhuyifei1999 closed this task as Invalid.Feb 1 2019, 7:22 AM
zhuyifei1999 added a subscriber: zhuyifei1999.

As far as I know, if https://petscan.wmflabs.org/ loads, there is nothing going wrong with the WMCS networking & routing, which would normally cause 502s.

If nothing is wrong with networking / routing, 502 means the application itself is taking way too long to respond, and you should get the tool maintainer to debug their application and see what is it spending the time on. You can reopen this task if the time is all wasted on WMCS infrastructure.

Tagishsimon reopened this task as Open.Feb 1 2019, 9:41 AM

How about we work out what's going on BEFORE we peremptorally close this issue. "As far as I know" does not cut it.

Magnus added a comment.Feb 1 2019, 9:49 AM

OK, so what appears to happen is that SQL queries timeout and take PetScan with them. Note:

  • I wrote some code that re-arranges certain large queries into smaller ones, which cuts down on the timeouts; that code has been live for weeks
  • That works fine on the dev machine but not reliably on the production machine
  • The dev machine has less resources than production, but is otherwise identical (OS etc)

As this does not fail reproducibly, it's either some odd bug in my code, or some situation on the DB replicas.

Magnus added a comment.Feb 1 2019, 9:51 AM

Also, I just clicked on the two examples. They took 133 and 173 seconds, and returned 50 and 1 results, respectively. No 502s, though I have seen those occasionally.

Thanks Magnus. I run those two daily; get perhaps 20% 502s, without an obvious pattern. There are others (e.g. listed on https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Women_in_Red/Metrics/Wikidata ) which more reliably 502

I think it's for you to decide whether to hold this ticket open or close it.

Magnus added a comment.Feb 1 2019, 1:09 PM

Testing on dev now. Looks like frequent Lost connection to MySQL server during query errors from the DB replicas.

Sorry to burden you with this. As a user 502s are v.frustrating, but I can also imagine it's v.frustrating for you to be asked to debug when you could be making snowmen instead.

Magnus added a comment.Feb 1 2019, 2:40 PM

Try it now...

Tagishsimon added a comment.EditedFeb 1 2019, 6:59 PM

Mixed results. The following three worked like a charm:

*Articles with no wikidata item
:*to 5 levels of depth - [http://petscan.wmflabs.org/?psid=5984708&al_commands=P31%3AQ5%0AP21%3AQ6581072 auto-run]
:*to 6 levels of depth - [http://petscan.wmflabs.org/?psid=7075418&al_commands=P31%3AQ5%0AP21%3AQ6581072 auto-run]
:*to 7 levels of depth - [http://petscan.wmflabs.org/?psid=7075446&al_commands=P31%3AQ5%0AP21%3AQ6581072 auto-run]

But the next three all 502d

*Articles, with wikidata items specifying 'human' but with no gender
:*to 5 levels depth - [http://petscan.wmflabs.org/?psid=5823558&al_commands=P21%3AQ6581072 auto-run]
:*to 6 levels depth - [http://petscan.wmflabs.org/?psid=7075471&al_commands=P21%3AQ6581072 auto-run]
:*to 7 levels depth - [http://petscan.wmflabs.org/?psid=7075514&al_commands=P21%3AQ6581072 auto-run]

Jar added a subscriber: Jar.Aug 6 2019, 8:14 PM
Restricted Application added a subscriber: alanajjar. · View Herald TranscriptAug 6 2019, 8:15 PM

There are some PetScan outages, reason unknown so far, but outside those, the above queries all work fine.

I did/do not see any successful Petscan query for days. Always getting a: 504 Gateway Time-out.

Same here. 504 Gateway Time-out on http://petscan.wmflabs.org/.

Aklapper closed this task as Invalid.EditedApr 29 2020, 12:07 PM

This seems to be tracked in https://bitbucket.org/magnusmanske/petscan/issues/163/petscans-down , as Magnus uses bitbucket.org to track issues.
Hence closing as invalid here as this task is tagged as VPS-Projects.

If this issue has been investigated and if this issue is/was a problem with the Cloud-VPS infrastructure itself, please reopen and tag it as Cloud-VPS.

Aklapper updated the task description. (Show Details)Apr 29 2020, 12:08 PM

@Aklapper actually the repo is https://github.com/magnusmanske/petscan_rs (the bitbucket one is the old C++ version).

@Magnus: Ah, thanks! Could be nice to add such documentation to https://wikitech.wikimedia.org/wiki/Nova_Resource:Petscan ?

Aklapper renamed this task from 502 Bad Gateway issue on Petscan to 502 /504 Bad Gateway issue on Petscan.May 1 2020, 9:17 AM
MB-one added a subscriber: MB-one.May 2 2020, 10:47 AM

For smaller queries, I think http://petscan-dev.wmflabs.org/. I don't know about it in detail.

It's my dev server, should work for any query size. Feel free to use when the main site is down.