Page MenuHomePhabricator

Investigate PlatformStatsSummary outage
Closed, ResolvedPublic

Description

The last successful PlatformStatsSummaryJob ran on October 11th.

Since then we've seen it failing with the error:
`[previous exception] [object] (PDOException(code: HY000): SQLSTATE[HY000]: General error: 1615 Prepared statement needs to be re-prepared at /var/www/html/app/Jobs/PlatformStatsSummaryJob.php:158)
[stacktrace]`

We never appear to see this error locally. It appears to be fired from https://github.com/wbstack/api/blob/main/app/Jobs/PlatformStatsSummaryJob.php#L148 which is particularly surprising since we would expect these prepared statements to be emulated anyway due to https://github.com/wbstack/api/blob/main/app/Jobs/PlatformStatsSummaryJob.php#L144C33-L144C54. We did speculate that this setting might be being race away by a race causing it to purge the connection and therefore that the fix might look like T346245. This appears not to be the case although as specified in the ticket there is the possibility that it's an different purge somewhere else.

Notably nothing happened on the deploy repo so I did wonder if the issue was due to one of our more stateful services. I restarted the api-queue as well as sql primary and secondary. This does not appear to have resolved the issues

Event Timeline

I just tried manually increasing the configuration value for table_definition_cache, yet the job still fails with the same error.

It seems we hit the limit for open tables somehow

MariaDB [(none)]> SHOW VARIABLES LIKE 'table_%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| table_definition_cache     | 1024  |
| table_open_cache           | 2000  |
| table_open_cache_instances | 8     |
+----------------------------+-------+
3 rows in set (0.001 sec)

MariaDB [(none)]> SHOW GLOBAL STATUS LIKE 'open_%';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Open_files               | 13    |
| Open_streams             | 4     |
| Open_table_definitions   | 1988  |
| Open_tables              | 2000  |
| Opened_files             | 57625 |
| Opened_plugin_libraries  | 0     |
| Opened_table_definitions | 37812 |
| Opened_tables            | 38085 |
| Opened_views             | 0     |
+--------------------------+-------+
9 rows in set (0.001 sec)

The config change is currently applied manually to production where the job succeeds.

Fring removed Fring as the assignee of this task.Oct 17 2023, 10:34 AM
Fring moved this task from Doing to In Review on the Wikibase Cloud (Kanban board Q4 2023) board.
Fring subscribed.
Evelien_WMDE claimed this task.