Page MenuHomePhabricator

$wgMaxCredits breaks MW on PostgreSQL
Closed, ResolvedPublic

Description

Author: gregab

Description:
Hello, all,

I was recently installing MW at a site after a long time again.

Platform: CentOS
MediaWiki: 1.15.1
Database: PostgreSQL 8.1.11

While configuring it, it choke when I set $wgMaxCredits to 3 in LocalSettings.php. The exact error is attached below.

The PostgreSQL error message made it apparent immediately that GROUP BY was not going to be the thing to solve what getCredits() was supposed to do, so I fixed the query to use DISTINCT ON. Here's the patch:

$hideBit = Revision::DELETED_USER; // username hidden?
  • $sql = "SELECT {$userTable}.*, MAX(rev_timestamp) as timestamp

+ $sql = "SELECT
+ DISTINCT ON (user_id)
+ {$userTable}.*,
+ rev_timestamp as timestamp

FROM $revTable LEFT JOIN $userTable ON rev_user = user_id
WHERE rev_page = $pageId
AND rev_user != $user
AND rev_deleted & $hideBit = 0
  • GROUP BY rev_user, rev_user_text, user_real_name
  • ORDER BY timestamp DESC";

+ ORDER BY user_id, timestamp DESC";

if($limit > 0) { $sql .= ' LIMIT '.$limit; }
if($offset > 0) { $sql .= ' OFFSET '.$offset; }

The sad thing is, it won't work with MySQL, of course, and I have no idea on how you MW guys circumvent such stuff.

This is the original error message:

A database error has occurred
Query: SELECT mwuser.*, MAX(rev_timestamp) as timestamp
FROM revision LEFT JOIN mwuser ON rev_user = user_id
WHERE rev_page = 6
AND rev_user != 2
AND rev_deleted & 4 = 0
GROUP BY rev_user, rev_user_text, user_real_name
ORDER BY timestamp DESC
Function: Article::getContributors
Error: 1 ERROR: column "mwuser.user_id" must appear in the GROUP BY clause or be used in an aggregate function

Backtrace:

#0 /var/www/wiki/includes/db/Database.php(616): DatabasePostgres->reportQueryError('ERROR: column ...', 1, 'SELECT mwuser.*...', 'Article::getCon...', false)
#1 /var/www/wiki/includes/Article.php(715): Database->query('SELECT mwuser.*...', 'Article::getCon...')
#2 /var/www/wiki/includes/Credits.php(103): Article->getContributors()
#3 /var/www/wiki/includes/Credits.php(65): Credits::getContributors(Object(Article), 2, true)
#4 /var/www/wiki/includes/SkinTemplate.php(371): Credits::getCredits(Object(Article), 3, true)
#5 /var/www/wiki/includes/OutputPage.php(968): SkinTemplate->outputPage(Object(OutputPage))
#6 /var/www/wiki/includes/Wiki.php(345): OutputPage->output()
#7 /var/www/wiki/index.php(117): MediaWiki->finalCleanup(Array, Object(OutputPage))
#8 {main}


Version: 1.20.x
Severity: major

Details

Reference
bz21196

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.
StatusSubtypeAssignedTask
InvalidNone
ResolvedNone

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:48 PM
bzimport added a project: Wikimedia-Rdbms.
bzimport set Reference to bz21196.
bzimport added a subscriber: Unknown Object (MLST).

-need-review, -patch : Patch is not viable since it doesn't work on MySQL by the submitters own admissions
-testme : This is used for old bugs when we aren't sure are still bugs

gregab wrote:

Oops, sorry about the testme tag... I guess I didn't read carefully enough. :-/

Anyways, having some modest interest in this being fixed, I'm going to persevere, ;-)

The other two options would be to:

  • use named columns from mwuser and group by each and every one of them or
  • switch the join tables and make the other table a subselect - afterall, the query wants user data and just uses revisions as a means to weed records out

Both (especially the other one, which is better in terms of performance, but more questionable in terms of compatibility, see query plans below) work in MySQL 4.1 and newer as well (http://dev.mysql.com/doc/refman/4.1/en/unnamed-views.html).

Forgot to mention yesterday, both changes also fix for 18078 - it's very related.

Hope this helped in any way.

Kind regards,
Grega

The two queries:

a) group by

SELECT m.user_id, m.user_name, m.user_real_name, m.user_email,

   m.user_editcount, MAX(r.rev_timestamp) AS timestamp
FROM $userTable AS m
LEFT JOIN $revTable AS r ON r.rev_user = m.user_id
WHERE r.rev_page = $pageId AND
      r.rev_user != $user AND
      r.rev_deleted & $hideBit = 0
GROUP BY m.user_id, m.user_name, m.user_real_name,
         m.user_email, m.user_editcount
ORDER BY timestamp DESC

b) subselect

SELECT m.*, r.timestamp

    FROM $userTable AS m
    JOIN (SELECT rev_user, MAX(rev_timestamp) AS timestamp
              FROM $revTable
              WHERE rev_page = $pageId AND
	            rev_user != $user AND
                    rev_deleted & $hideBit = 0
             GROUP BY rev_user
             ORDER BY timestamp DESC) AS r
    ON m.user_id = r.rev_user;

What the PostgreSQL planner thinks of the two:

wiki=# explain analyze select m.user_id, m.user_name, m.user_real_name, m.user_email, m.user_editcount, max(r.rev_timestamp) as timestamp from mwuser m left join revision r on r.rev_user = m.user_id where r.rev_page = 3 and r.rev_user != 1 and r.rev_deleted & 4 = 0 group by m.user_id, m.user_name, m.user_real_name, m.user_email, m.user_editcount order by timestamp desc;

QUERY PLAN

Sort (cost=7.50..7.51 rows=1 width=112) (actual time=0.361..0.363 rows=1 loops=1)

Sort Key: (max(r.rev_timestamp))
Sort Method:  quicksort  Memory: 25kB
->  HashAggregate  (cost=7.48..7.49 rows=1 width=112) (actual time=0.344..0.346 rows=1 loops=1)
      ->  Nested Loop  (cost=0.00..7.47 rows=1 width=112) (actual time=0.068..0.296 rows=6 loops=1)
            Join Filter: (m.user_id = r.rev_user)
            ->  Seq Scan on revision r  (cost=0.00..6.33 rows=1 width=12) (actual time=0.031..0.114 rows=6 loops=1)
                  Filter: ((rev_user <> 1) AND (rev_page = 3) AND (((rev_deleted)::integer & 4) = 0))
            ->  Seq Scan on mwuser m  (cost=0.00..1.06 rows=6 width=104) (actual time=0.003..0.012 rows=6 loops=6)

Total runtime: 0.521 ms
(10 rows)

wiki=# explain analyze select m.user_id, m.user_name, m.user_real_name, m.user_email, m.user_editcount, r.timestamp from mwuser m join (select rev_user, max(rev_timestamp) as timestamp from revision where rev_page = 3 and rev_user != 1 and rev_deleted & 4 = 0 group by rev_user order by timestamp desc) r on m.user_id = rev_user;

QUERY PLAN

Hash Join (cost=6.39..7.48 rows=1 width=112) (actual time=0.244..0.250 rows=1 loops=1)

Hash Cond: (m.user_id = r.rev_user)
->  Seq Scan on mwuser m  (cost=0.00..1.06 rows=6 width=104) (actual time=0.013..0.024 rows=6 loops=1)
->  Hash  (cost=6.37..6.37 rows=1 width=12) (actual time=0.191..0.191 rows=1 loops=1)
      ->  Subquery Scan r  (cost=6.36..6.37 rows=1 width=12) (actual time=0.181..0.186 rows=1 loops=1)
            ->  Sort  (cost=6.36..6.36 rows=1 width=12) (actual time=0.176..0.178 rows=1 loops=1)
                  Sort Key: (max(revision.rev_timestamp))
                  Sort Method:  quicksort  Memory: 25kB
                  ->  HashAggregate  (cost=6.34..6.35 rows=1 width=12) (actual time=0.155..0.160 rows=1 loops=1)
                        ->  Seq Scan on revision  (cost=0.00..6.33 rows=1 width=12) (actual time=0.027..0.122 rows=6 loops=1)
                              Filter: ((rev_user <> 1) AND (rev_page = 3) AND (((rev_deleted)::integer & 4) = 0))

Total runtime: 0.385 ms
(12 rows)

Applied a conditional group by in r59123

Need a lot more columns than the ones in the example: rolled back the change until I have time to dig in again.

  • Bug 22107 has been marked as a duplicate of this bug. ***

s_angelov wrote:

Added dependency to Bug 384 for tracking.

  • Bug 28193 has been marked as a duplicate of this bug. ***

Reverted in r87152: causes failures like this (seen when attempting to log in)

SkinTemplate::makeTalkUrlDetails given invalid pagename User:

Backtrace:

#0 /var/www/trunk/includes/SkinTemplate.php(691): SkinTemplate->makeTalkUrlDetails('User:')
#1 /var/www/trunk/includes/SkinTemplate.php(495): SkinTemplate->buildPersonalUrls(Object(OutputPage))
#2 /var/www/trunk/includes/OutputPage.php(1906): SkinTemplate->outputPage(Object(OutputPage))
#3 /var/www/trunk/includes/Wiki.php(402): OutputPage->output()
#4 /var/www/trunk/index.php(146): MediaWiki->finalCleanup()
#5 {main}

Most of r87164 doesn't seem to have anything obvious to do with the query & postgres, but seems to be retooling User so it doesn't bother fetching full row data during later processing on the results, if you only ask for fields that were already provided.

Could perhaps do with some regression test cases on things like this?

sumanah wrote:

Grega Bremec, are you still running into this problem with MediaWiki 1.17.0?

overlordq wrote:

This still exists up to 1.19

2011-09-26 22:00:48 UTC ERROR: function first(text) does not exist at character 102
2011-09-26 22:00:48 UTC HINT: No function matches the given name and argument types. You might need to add explicit type casts.
2011-09-26 22:00:48 UTC STATEMENT: SELECT /* WikiPage::getContributors 65.65.116.130 */ rev_user as user_id,rev_user_text AS user_name,FIRST(user_real_name) AS user_real_name,MAX(rev_timestamp) AS timestamp FROM "revision" LEFT JOIN mwuser ON ((rev_user = user_id)) WHERE rev_page = '1' AND (rev_user != 1) AND ((rev_deleted & 4) = 0) GROUP BY rev_user,rev_user_text ORDER BY timestamp DESC

As reported by hashar in bug 33686:

Fixed in trunk by r98222. Richard, you can manually apply the patch to fix the
issue (aka replace FIRST() by MIN()).

Merged in REL1_18 with r108767.

And indeed, as of r113993 I can't reproduce it with PostgreSQL anymore.

Jdforrester-WMF added a subscriber: Jdforrester-WMF.

Migrating from the old tracking task to a tag for PostgreSQL-related tasks.