Page MenuHomePhabricator

Exception of type Wikimedia\\Rdbms\\DBConnectionError after an API query
Closed, ResolvedPublic

Description

The request
https://wikitech.wikimedia.org/w/api.php?action=query&formatversion=2&prop=revisions&format=json&rvprop=content&rvsection=2&titles=Server%20Admin%20Log
causes this error:

{
    "error": {
        "code": "internal_api_error_DBConnectionError",
        "info": "[62a725b459d8e036f5a4e627] Caught exception of type Wikimedia\\Rdbms\\DBConnectionError"
    },
    "servedby": "labweb1002"
}

Event Timeline

He7d3r created this task.Nov 24 2018, 8:19 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 24 2018, 8:19 PM
Krinkle added a subscriber: Krinkle.
exception
[62a725b459d8e036f5a4e627] /w/api.php?...
/srv/mediawiki/php-1.33.0-wmf.4/includes/libs/rdbms/loadbalancer/LoadBalancer.php:1195

Wikimedia\Rdbms\DBConnectionError: Cannot access the database: Unknown error (10.64.16.79)

#0 /srv/mediawiki/php-1.33.0-wmf.4/includes/libs/rdbms/loadbalancer/LoadBalancer.php(753): Wikimedia\Rdbms\LoadBalancer->reportConnectionError()
#1 /srv/mediawiki/php-1.33.0-wmf.4/includes/GlobalFunctions.php(2653): Wikimedia\Rdbms\LoadBalancer->getConnection(integer, array, boolean)
#2 /srv/mediawiki/php-1.33.0-wmf.4/includes/api/ApiBase.php(651): wfGetDB(integer, string)
#3 /srv/mediawiki/php-1.33.0-wmf.4/includes/api/ApiPageSet.php(1416): ApiBase->getDB()
#4 /srv/mediawiki/php-1.33.0-wmf.4/includes/api/ApiPageSet.php(805): ApiPageSet->getDB()
#5 /srv/mediawiki/php-1.33.0-wmf.4/includes/api/ApiPageSet.php(229): ApiPageSet->initFromTitles(array)
#6 /srv/mediawiki/php-1.33.0-wmf.4/includes/api/ApiPageSet.php(140): ApiPageSet->executeInternal(boolean)
#7 /srv/mediawiki/php-1.33.0-wmf.4/includes/api/ApiQuery.php(234): ApiPageSet->execute()
#8 /srv/mediawiki/php-1.33.0-wmf.4/includes/api/ApiMain.php(1570): ApiQuery->execute()
#9 /srv/mediawiki/php-1.33.0-wmf.4/includes/api/ApiMain.php(531): ApiMain->executeAction()
#10 /srv/mediawiki/php-1.33.0-wmf.4/includes/api/ApiMain.php(502): ApiMain->executeActionWithErrorHandling()
#11 /srv/mediawiki/php-1.33.0-wmf.4/api.php(87): ApiMain->execute()
#12 /srv/mediawiki/w/api.php(3): include(string)
#13 {main}
DBConnection
ERROR from Wikimedia\Rdbms\DatabaseMysqlBase::open:
  Error connecting to 10.64.16.79:
    Too many connections
Marostegui edited projects, added cloud-services-team (Kanban); removed DBA.
Marostegui added a subscriber: Marostegui.

This is probably another spike similar to T188589 or T209480 hitting db1073 (m5 master)

Which makes it much weirder to me?

Unless this happened only before we ended up way below the max connections.

It could have been a spike fast enough not to be captured by graphs - not the first time we have seen that happening

Current status for the record

root@MISC m5[information_schema]> select user, count(*) as count FROM information_schema.processlist GROUP BY user ORDER BY count DESC;
+-----------------+-------+
| user            | count |
+-----------------+-------+
| nova            |   174 |
| keystone        |    77 |
| neutron         |    48 |
| glance          |    35 |
| designate       |    18 |
| watchdog        |     7 |
| testreduce      |     2 |
| repl            |     2 |
| root            |     2 |
| wikiuser        |     1 |
| event_scheduler |     1 |
| wikiadmin       |     1 |
+-----------------+-------+
12 rows in set (0.00 sec)

This was open on Saturday, when the issue paged and we fixed it, though. Has it happened since then?

I suppose I'm asking @Krinkle if that's from logs or this just happened today as well.

Ah, I didn't realise it was from Saturday - I got confused with the update at T210332#4786844.

The update from @Krinkle has the same has than the reporter: 62a725b459d8e036f5a4e627 so I guess he was expanding the info.
I think this is fine to be closed as it is resolved.

Spiking by over 100 connections would take a serious hit that none of our tools currently are able to do.

Marostegui closed this task as Resolved.Nov 30 2018, 8:54 AM

I think this is resolved - please reopen if you think otherwise!
Thanks for reporting it!

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:08 PM