Page MenuHomePhabricator

Database::upsert() for Postgres triggers an (ignored) error, by design
Closed, ResolvedPublic

Description

MediaWiki	1.30.0-alpha (1ba29a5)
PHP	7.1.1 (apache2handler)
PostgreSQL	9.3.12
ICU	57.1
[DBQuery] Schema "mediawiki" added to the search path

[DBConnection] Connected to database 0 at 'localhost'.
[DBQuery] Schema "mediawiki" added to the search path

[DBConnection] Connected to database 0 at 'localhost'.
[SQLBagOStuff] Connection Resource id #319 will be used for SqlBagOStuff
[DBQuery] Transaction state: savepoint "mw" established.

[DBQuery] SQL ERROR (ignored): ERROR:  duplicate key value violates unique constraint "md_module_skin"
DETAIL:  Key (md_module, md_skin)=(mediawiki.helplink, vector|en) already exists.


[DBQuery] Transaction state: savepoint "mw" rolled back.

[SQLBagOStuff] Connection Resource id #318 will be used for SqlBagOStuff
[DBConnection] Closing connection to database 'localhost'.
[DBQuery] Transaction state: savepoint "mw" established.

[DBQuery] Transaction state: savepoint "mw" released.

[DBQuery] Transaction state: savepoint "mw" established.

[DBQuery] SQL ERROR (ignored): ERROR:  duplicate key value violates unique constraint "md_module_skin"
DETAIL:  Key (md_module, md_skin)=(mediawiki.legacy.shared, vector|en) already exists.


[DBQuery] Transaction state: savepoint "mw" rolled back.

[DBQuery] Transaction state: savepoint "mw" established.

[DBQuery] SQL ERROR (ignored): ERROR:  duplicate key value violates unique constraint "md_module_skin"
DETAIL:  Key (md_module, md_skin)=(mediawiki.skinning.interface, vector|en) already exists.


[DBQuery] Transaction state: savepoint "mw" rolled back.

[DBQuery] Transaction state: savepoint "mw" established.

[DBQuery] SQL ERROR (ignored): ERROR:  duplicate key value violates unique constraint "md_module_skin"
DETAIL:  Key (md_module, md_skin)=(skins.vector.styles, vector|en) already exists.


[DBQuery] Transaction state: savepoint "mw" rolled back.

Event Timeline

Adding or updating rows to the module_deps table only happens in one place, and it uses the Database::upsert() method to naturally not create duplicate rows for the same key.

https://github.com/wikimedia/mediawiki/blob/1.29.0-rc.1/includes/resourceloader/ResourceLoaderModule.php#L496

If this causes errors, I suspect then that the DatabasePostgres implementation might need special attention. The default implementation inherited from DatabaseBase uses a transaction to first try to update an existing row, and otherwise insert a new one.

Did that not work?

The warning says "SQL ERROR (ignored)" which afaik is a DEBUG-level message for debugging/development purposes only (to try and understand what the code is doing). This does not seem indicative of any problem?

Did you find a WARNING or ERROR level message nearby in the logs? Or a user-facing exception page, or otherwise finding that the query in question did not succeed by other means?

Krinkle triaged this task as Medium priority.Jul 7 2017, 11:24 PM
Krinkle removed a project: MediaWiki-ResourceLoader.

Did that not work?

That is the only thing in the log output therefore I cannot judge its severity in terms of "[DBQuery] SQL ERROR (ignored): ERROR: duplicate key value violates ...".

Did you find a WARNING or ERROR level message nearby in the logs?

No.

Or a user-facing exception page, or otherwise finding that the query in question did not succeed by other means?

No.

Krinkle closed this task as Declined.EditedJul 29 2017, 4:31 AM

Based on that, I'll close this issue then. I agree that its hard to judge severity from the text message (especially with words like "ERROR" everywhere).

I'd recommend configuring your log file format so that the severity level is included for your convenience. Similar to what one might see in Syslog. When I said earlier that this is a DEBUG level message, I meant that, literally, there is meta-data associated with this log entry that has explicit field set with severity => DEBUG.

Changing your log file format may be complication, but at the very least you can start by configuring which severity levels you are not interested in. (For a production level you'll probably want to reduce log entries to only WARNING and higher.)

See wgDebugLogGroups for this. Setting 'level' => \Psr\Log\LogLevel::WARNING, will configure the channel to only log entries with WARNING or ERROR.

This has been possible since the introduction of "Structured logging" in MediaWiki (following PSR-3).

The default text file log format in MediaWiki, unfortunately, has not been changed, and still doesn't include any of this meta data. (Preserved as "LegacyLogger" in the new system.)

See also:

Legoktm added a subscriber: Legoktm.

Something is wrong here. Maybe the default log verbosity needs to be changed, but if these errors are expected (which it sounds like they are), they shouldn't be showing up in logs as errors by default.

Legoktm renamed this task from ERROR: duplicate key value violates unique constraint "md_module_skin" to ERROR: duplicate key value violates unique constraint "md_module_skin" with Postgres.Sep 29 2017, 4:54 PM
Legoktm added a subscriber: IijimaYun.

Something is wrong here. Maybe the default log verbosity needs to be changed, but if these errors are expected (which it sounds like they are), they shouldn't be showing up in logs as errors by default.

I could be wrong, but afaik these are not showing up "as errors". It looks like this is a paste from debug.log, which (by design) contains lots of non-error messages for debugging purposes.

The confusing part here is that Database is logging the internally caught non-error into the debug channel for debugging purposes. The only difference between it and a real error is the severity level, and the indication "(ignored)". And our default legacy/debug line format does not include the severity level. Perhaps we should change that, but that's a separate task.

@Legoktm I'm fine with keeping this open, but then I'd prefer we move it from MediaWiki-Database to MediaWiki-Debug-Logging and refocus the title.

I would like to reiterate that this error appears in PostgreSQL error log with a severity of ERROR. Debug log is off at least in my case.

Oct  4 07:38:42 db postgres[96056]: [9-1] ERROR:  duplicate key value violates unique constraint "md_module_skin"
Oct  4 07:38:42 db postgres[96056]: [9-2] DETAIL:  Key (md_module, md_skin)=(mediawiki.action.view.postEdit, vector|en-gb) already exists.

I would like to reiterate that this error appears in PostgreSQL error log with a severity of ERROR. Debug log is off at least in my case.

Oct  4 07:38:42 db postgres[96056]: [9-1] ERROR:  duplicate key value violates unique constraint "md_module_skin"
Oct  4 07:38:42 db postgres[96056]: [9-2] DETAIL:  Key (md_module, md_skin)=(mediawiki.action.view.postEdit, vector|en-gb) already exists.

That is the PostgreSQL error log, not the MediaWiki error log.

Krinkle renamed this task from ERROR: duplicate key value violates unique constraint "md_module_skin" with Postgres to Database::upsert() for Postgres triggers an (ignored) error, by design.Oct 3 2017, 10:51 PM
Krinkle added a project: MW-1.29-release.
Krinkle moved this task from Usage problem to Rdbms library on the Wikimedia-Rdbms board.

The confusing part here is that Database is logging the internally caught non-error into the debug channel for debugging purposes.

Well, then it means something is off base and your constant refocus on pushing this towards some kind of logging severity non DB issue isn't actually addressing the cause.

That is the PostgreSQL error log, not the MediaWiki error log.

If PostgreSQL sees this as an ERROR severity, yet you are arguing it ought not to be a MediaWiki issue because you think the debug logging is incorrect then I would ask for some concise evidence that supports your hypothesis. Currently, the only evidence we have is that from PostgreSQL which classifies it as an ERROR.

As as noted before, the issue was not present before MW 1.29.

Side note: technically there are no problems that could lead to any kind of data loss or corruption (this happens inside a transaction), but I am not sure it is good practice to issue statements that lead to messages flooding the database server's error log.

I have multiple instances of MediaWiki 1.26.x that I can, if needed, step-by-step upgrade to 1.29 to find which release exactly starts exhibiting this behaviour. I *think* it was 1.29 because I had a 1.28.x instance that I upgraded to 1.29 for testing purposes which led to these errors appearing, but sadly I am not 100% sure about the exact version number prior to upgrade; it may have been 1.27.

This issue does not require any actual data in the wiki, a new installation is enough to get it to trigger. I spent a whole day trying to resolve this issue thinking it was somehow database getting corrupted and this was the observation I ended with. MediaWiki's database abstraction layer PHP code is a bit too arcane for me but I will try looking into it again and will report here if I find the particular commit that brought this in.

Adding or updating rows to the module_deps table only happens in one place, and it uses the Database::upsert() method to naturally not create duplicate rows for the same key.

https://github.com/wikimedia/mediawiki/blob/1.29.0-rc.1/includes/resourceloader/ResourceLoaderModule.php#L496

If this causes errors, I suspect then that the DatabasePostgres implementation might need special attention. The default implementation inherited from DatabaseBase uses a transaction to first try to update an existing row, and otherwise insert a new one.

This is true, the related part in the default database implementation is https://github.com/wikimedia/mediawiki/blob/1.29.0-rc.1/includes/libs/rdbms/database/Database.php#L2216, where the 'IGNORE' option is passed as an option to postgres' INSERT implementation, which uses savepoints and turns off error reporting.

I'm not sure where this fails, nevertheless I've specialized the upsert method for postgres using the standard INSERT ... ON CONFLICT ... UPDATE expression. It seems to work, but I've had to modify the insert method arguments and construct the ON CONFLICT clause using the uniqueIndexes argument from Database::upsert(), which not only makes the code less clean, but as the column keys in the ON CONFLICT clause really need to be unique indexes, this revealed the missing unique index in the mediawiki.redirect table over rd_from (this is a foreign key, but we don't want a page to be redirected to two or more other pages -- this never actually happened, I assume it's handled in the code, but the unique index is required in the ON CONFLICT clause). After creating the unique index, all error messages seem to be gone now.

Due to the implications of having to create the index and not having tested everything properly in addition to the need for refactoring (I haven't contributed to MediaWiki yet), I think it would be better to look into handling the savepoints, but I don't think I'll have time to look into it myself in near future. I'm including the patch for reference.

this revealed the missing unique index in the mediawiki.redirect table over rd_from (this is a foreign key, but we don't want a page to be redirected to two or more other pages -- this never actually happened, I assume it's handled in the code, but the unique index is required in the ON CONFLICT clause). After creating the unique index, all error messages seem to be gone now.

Interestingly enough, it does not seem to be handled in the PHP code, at least in my instance:

ERROR:  could not create unique index "redirect_rd_from"
DETAIL:  Key (rd_from)=(110) is duplicated.

I see 3-4 copies for each rd_from instance in the database:

=> select * from mediawiki.redirect order by rd_from;
<skip>
     225 |            0 | Gridley                       |              |
     225 |            0 | Gridley                       |              |
     225 |            0 | Gridley                       |              |
     225 |            0 | Gridley                       |              |
     226 |            0 | Craven                        |              |
     226 |            0 | Craven                        |              |
     226 |            0 | Craven                        |              |
     226 |            0 | Craven                        |              |
     227 |            0 | McCall                        |              |
     227 |            0 | McCall                        |              |
     227 |            0 | McCall                        |              |
     227 |            0 | McCall                        |              |

In fact every single run of rebuildall.php adds new duplicates to this table.

For those who need to clean their redirects table up before adding the index:

=> delete from mediawiki.redirect where ctid not in (SELECT MIN(ctid) FROM mediawiki.redirect GROUP BY rd_from,rd_namespace,rd_title,rd_interwiki,rd_fragment);
DELETE 757

@Ivor, I applied your patch 2 days ago and no errors so far, thank you. Benchmarks show database query time improvements of around 5% (probably thanks to no error logging going on any more, along with a better query plan).

The small problem with this patch, however, is that it needs some fallback method for PostgreSQL 9.4 and below; sadly not everyone runs 9.5 and 9.6 (there still are lots of 8.x instances too).

Further testing:

There might be a schema issue in category:

Oct 11 01:22:07 db postgres[83312]: [7-1] ERROR:  column reference "cat_pages" is ambiguous at character 1
91
Oct 11 01:22:07 db postgres[83312]: [7-2] STATEMENT:  INSERT /* WikiPage::updateCategoryCounts  */ INTO "c
ategory" (cat_title,cat_pages,cat_subcats,cat_files) VALUES ('North_Union','1','0','0') ON CONFLICT (cat_title) DO UPDAT
E SET cat_pages = cat_pages + 1 WHERE ((category.cat_title = 'North_Union'))

And a deadlock happens sometimes (so far 4 times over about 24 hours) in upsert in module_deps:

Oct 11 09:16:27 db postgres[23184]: [7-1] ERROR:  deadlock detected
Oct 11 09:16:27 db postgres[23184]: [7-2] DETAIL:  Process 23184 waits for ShareLock on transaction 112737; blocked by process 23183.
Oct 11 09:16:27 db postgres[23184]: [7-3]       Process 23183 waits for ShareLock on transaction 112738; blocked by process 23184.
Oct 11 09:16:27 db postgres[23184]: [7-4]       Process 23184: UPDATE /* Wikimedia\Rdbms\DatabasePostgres::upsert  */  "module_deps" SET md_deps = '["resources/src/mediawiki.widgets/mw.widgets.TitleWidget.less"]' WHERE ((module_deps.md_module = 'mediawiki.widgets') OR (module_deps.md_skin = 'vector|en-gb'))
Oct 11 09:16:27 db postgres[23184]: [7-5]       Process 23183: UPDATE /* Wikimedia\Rdbms\DatabasePostgres::upsert  */  "module_deps" SET md_deps = '["resources/src/mediawiki.less/mediawiki.mixins.less","resources/src/mediawiki.less/mediawiki.ui/variables.less","resources/src/mediawiki.ui/components/icons.less"]' WHERE ((module_deps.md_module = 'mediawiki.ui.icon') OR (module_deps.md_skin = 'minerva|en-gb'))
Oct 11 09:16:27 db postgres[23184]: [7-6] HINT:  See server log for query details.
Oct 11 09:16:27 db postgres[23184]: [7-7] CONTEXT:  while rechecking updated tuple (94,4) in relation "module_deps"
Oct 11 09:16:27 db postgres[23184]: [7-8] STATEMENT:  UPDATE /* Wikimedia\Rdbms\DatabasePostgres::upsert  */  "module_deps" SET md_deps = '["resources/src/mediawiki.widgets/mw.widgets.TitleWidget.less"]' WHERE ((module_deps.md_module = 'mediawiki.widgets') OR (module_deps.md_skin = 'vector|en-gb'))
Oct 11 09:16:28 db postgres[23184]: [8-1] ERROR:  current transaction is aborted, commands ignored until end of transaction block
Oct 11 09:16:28 db postgres[23184]: [8-2] STATEMENT:  BEGIN /* Wikimedia\Rdbms\DatabasePostgres::upsert  */
Oct 11 09:16:28 db postgres[23184]: [9-1] ERROR:  current transaction is aborted, commands ignored until end of transaction block
Oct 11 09:16:28 db postgres[23184]: [9-2] STATEMENT:  BEGIN /* Wikimedia\Rdbms\Database::query (Pingback::checkIfSent)  */

this revealed the missing unique index in the mediawiki.redirect table over rd_from (this is a foreign key, but we don't want a page to be redirected to two or more other pages -- this never actually happened, I assume it's handled in the code, but the unique index is required in the ON CONFLICT clause). After creating the unique index, all error messages seem to be gone now.

Interestingly enough, it does not seem to be handled in the PHP code, at least in my instance:

ERROR:  could not create unique index "redirect_rd_from"
DETAIL:  Key (rd_from)=(110) is duplicated.

@IijimaYun, you're right, I remembered I had to do the same procedure about a month ago. The redirect table shouldn't be this messy and should have the unique index nevertheless.

Thank you for trying the patch and sharing your experience. The ambiguity in column references can be easily avoided using table names as disambiguations, I'll look into that hopefully sometime soon. As for the deadlock, I'm not yet sure what causes this.

Yesterday I've upgraded PostgreSQL from 9.6 to 10, so far everything seems to run well (only one unrelated error with timestamps in a SELECT query), but it's only been a day and my MediaWiki instance is rarely used. Indeed, this would require a fallback for <9.4.

Re: the first issue, I checked the documentation and seems like changing in page/WikiPage.php:

$addFields = [ 'cat_pages = cat_pages + 1' ];
$removeFields = [ 'cat_pages = cat_pages - 1' ];
if ( $ns == NS_CATEGORY ) {
        $addFields[] = 'cat_subcats = cat_subcats + 1';
        $removeFields[] = 'cat_subcats = cat_subcats - 1';
} elseif ( $ns == NS_FILE ) {
        $addFields[] = 'cat_files = cat_files + 1';
        $removeFields[] = 'cat_files = cat_files - 1';
}

to

global $wgDBprefix;
$addFields = [ 'cat_pages = ' . $wgDBprefix . 'category.cat_pages + 1' ];
$removeFields = [ 'cat_pages = ' . $wgDBprefix . 'category.cat_pages - 1' ];
if ( $ns == NS_CATEGORY ) {
        $addFields[] = 'cat_subcats = ' . $wgDBprefix . 'category.cat_subcats + 1';
        $removeFields[] = 'cat_subcats = ' . $wgDBprefix . 'category.cat_subcats - 1';
} elseif ( $ns == NS_FILE ) {
        $addFields[] = 'cat_files = ' . $wgDBprefix . 'category.cat_files + 1';
        $removeFields[] = 'cat_files = ' . $wgDBprefix . 'category.cat_files - 1';
}

should be compatible with other database engines (yet once again I am left awestruck by this rather direct approach to SQL statement construction).

I am not sure about using $wgDBprefix like this, though.

It doesn't look like it'll be possible to have this fixed for 1.31. I'm hopeful that once T195807: Fix failing MediaWiki core tests on Postgres database backend we can start making some headway into other postgres tech debt.

@Ivor your patch fails in the first 4 chunks for REL_1.31 so I tried adapting it. Notably there is now an $ignore check which adds DO NOTHING to the query; I added a check for $set to avoid it if we already have an ON CONFLICT UPDATE added.

Obviously this will still not solve the constant deadlocks in module_deps

Anomie closed this task as Resolved.EditedDec 14 2018, 4:21 PM
Anomie claimed this task.
Anomie added a subscriber: Anomie.

It looks like the error in the task description was fixed for users of PostgreSQL 9.5+ in MediaWiki 1.31 with rMWcc0473766a53: rdbms: Remove support for PostgreSQL < 9.2, and improve INSERT IGNORE for 9.5, as calls to ->insert() with the IGNORE option no longer log an error to MediaWiki's log files. Thus I'm going to close this as Resolved.

For users of earlier versions of PostgreSQL, you'll have to deal with the ignored error being logged. You can consider the task closed as Declined instead if you'd like, since it's not going to be fixed for you. We should likely be dropping support for PG <9.5 soon anyway.

Any other issues that might have been raised in the comments should be filed as their own tasks.