Page MenuHomePhabricator

I break the tool :(
Closed, ResolvedPublic

Description

Attempting to load https://techcontribs.toolforge.org/uid/reedy gives

An error occurred while trying to load data: Timeout while fetching Gerrit patches (internal-error). Please try again later.

An error occurred while trying to load data: Timeout while fetching Phabricator tasks (internal-error). Please try again later.

Event Timeline

Reedy moved this task from Backlog to Bugs on the Tool-techcontribs board.

Hi, @Reedy! This only ever happens to people with contributions numbering the thousands and spanning many years. I've set timeouts (30 seconds) internally to make sure that Tech Contribs doesn't pull too much data from Gerrit and Phabricator's APIs, since methods of bulk data access (like with the Replica DBs or wiki dumps) don't seem to exist for our DevOps stuff. The goal is to avoid unintentionally bringing down Gerrit, or causing high amounts of load that would otherwise be perceived as some form of DoS attack. 30 seconds is an arbitrary number—I haven't been contacted by Release Engineering to slow down, which is good, but I also understand that it's pretty low. This behavior is important to prevent Gerrit from blocking the tool (or other tools, as collateral damage) from making requests, as this does seem to happen from time to time. Whenever that happens, it causes its own different kind of error:

[Error]: terminated (internal-error)
    at /workspace/.next/server/chunks/690.js:1:22087
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Z (/workspace/.next/server/chunks/690.js:1:30168) {
  errorResponse: {
    code: 'internal-error',
    message: 'terminated',
    stack: 'TypeError: terminated\n' +
      '    at Fetch.onAborted (node:internal/deps/undici/undici:11014:53)\n' +
      '    at Fetch.emit (node:events:519:28)\n' +
      '    at Fetch.terminate (node:internal/deps/undici/undici:10200:14)\n' +
      '    at Object.onError (node:internal/deps/undici/undici:11132:38)\n' +
      '    at _Request.onError (node:internal/deps/undici/undici:7303:31)\n' +
      '    at errorRequest (node:internal/deps/undici/undici:9863:17)\n' +
      '    at TLSSocket.onSocketClose (node:internal/deps/undici/undici:9018:9)\n' +
      '    at TLSSocket.emit (node:events:531:35)\n' +
      '    at node:net:337:12\n' +
      '    at TCP.done (node:_tls_wrap:657:7)'
  }
}

Which basically just means "the other side immediately closed the connection".

Luckily, for Phabricator, this is a recoverable error. If you visit your page again, you'll should see Phabricator statistics now. For Gerrit, it's still not recoverable for now since Gerrit doesn't allow setting the sort of change searches to date ascending (it's not even mentioned at all in the documentation), making a solution for this a bit harder to implement. The most likely solution is to temporarily save whatever data we've pulled from Gerrit in the cache (which is a lot of data, what I hope is that it doesn't end up filling the cache!), and then continue off of that if the user keeps retrying. But aside from that, I should probably also add a "retry" button on those errors to make it clear that the Phabricator one (and hopefully the Gerrit one too, soon) is/are recoverable.

@Chlod There is a replica for gerrit at gerrit-replica it just doesn't have a web interface but everything else is the same iirc, @hashar (or maybe @Dzahn iirc) would probably be your best bets to learn about its load limitations and compacity though

@Reedy when writing 🐛 bug 🎫 tickets, it's a good idea to describe the bug properly in the name. It's not immediately obvious here what the referents of "I" or "the tool" are. 🤷🤦

The tool is defined by the tag.

I is pretty clear it means me.

Please keep using regular gerrit and not the replica unless you explicitly hear otherwise from releng or sre-collab. A WMF IP should not be throttled. Since nothing secret is inside Gerrit there might be ways to actually offer methods of bulk data access in the future.

Chlod claimed this task.

Timeout for both Gerrit and Phabricator data fetching has been bumped to 600 seconds. You do not break the tool anymore! :D