Page MenuHomePhabricator

API: Fatal error with ucprop=patrolled
Closed, ResolvedPublic

Description

Author: matthew.britton

Description:
http://en.wikipedia.org/w/api.php?action=query&list=usercontribs&ucuser=Gurch&ucprop=patrolled

gives

<api>
<error code="internal_api_error_DBQueryError" info="Database query error">

#0 /usr/local/apache/common-local/php-1.5/includes/db/Database.php(591): Database->reportQueryError('The SELECT woul...', 1104, 'SELECT rev_tim...', 'ApiQueryContrib...', false)
#1 /usr/local/apache/common-local/php-1.5/includes/db/Database.php(1001): Database->query('SELECT rev_tim...', 'ApiQueryContrib...')
#2 /usr/local/apache/common-local/php-1.5/includes/api/ApiQueryBase.php(222): Database->select(Array, Array, Array, 'ApiQueryContrib...', Array, Array)
#3 /usr/local/apache/common-local/php-1.5/includes/api/ApiQueryUserContributions.php(83): ApiQueryBase->select('ApiQueryContrib...')
#4 /usr/local/apache/common-local/php-1.5/includes/api/ApiQuery.php(213): ApiQueryContributions->execute()
#5 /usr/local/apache/common-local/php-1.5/includes/api/ApiMain.php(427): ApiQuery->execute()
#6 /usr/local/apache/common-local/php-1.5/includes/api/ApiMain.php(260): ApiMain->executeAction()
#7 /usr/local/apache/common-local/php-1.5/includes/api/ApiMain.php(244): ApiMain->executeActionWithErrorHandling()
#8 /usr/local/apache/common-local/php-1.5/api.php(77): ApiMain->execute()
#9 /usr/local/apache/common-local/live-1.5/api.php(3): require('/usr/local/apac...')
#10 {main}

</error>
</api>


Version: unspecified
Severity: normal
URL: http://en.wikipedia.org/w/api.php?action=query&list=usercontribs&ucuser=Gurch&ucprop=patrolled

Details

Reference
bz17215

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:24 PM
bzimport added a project: MediaWiki-API.
bzimport set Reference to bz17215.

Bryan.TongMinh wrote:

The error 1104 is: ERROR 1104: The SELECT would examine more rows than MAX_JOIN_SIZE.

Obviously a join that is going wrong somewhere.

ayg wrote:

What's the query that would be run?

(In reply to comment #2)

What's the query that would be run?

SELECT /* ApiQueryContributions::execute Catrope */ rev_timestamp,page_namespace,page_title,rev_user_text,rc_patrolled
FROM page,revision FORCE INDEX (usertext_timestamp)
LEFT JOIN recentchanges ON ((rc_this_oldid=rev_id))
WHERE (page_id=rev_page)
AND rev_deleted = '0'
AND rev_user_text = 'Gurch'
ORDER BY rev_user_text DESC, rev_timestamp DESC
LIMIT 11

I personally don't see how this produces huge joins, as we're joining on unique indices, but then I've been known to be completely wrong about DB-related stuff before :D

ayg wrote:

This is the explain I get on the toolserver:

mysql> EXPLAIN SELECT rev_timestamp,page_namespace,page_title,rev_user_text,rc_patrolled FROM page,revision LEFT JOIN recentchanges ON ((rc_this_oldid=rev_id)) WHERE (page_id=rev_page) AND rev_deleted = '0' AND rev_user_text = 'Gurch' ORDER BY rev_user_text DESC, rev_timestamp DESC LIMIT 11;
+----+-------------+---------------+--------+-------------------------------------------+--------------------+---------+--------------------------+---------+----------------------------------------------+

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra

+----+-------------+---------------+--------+-------------------------------------------+--------------------+---------+--------------------------+---------+----------------------------------------------+

1SIMPLErevisionrefPRIMARY,page_timestamp,usertext_timestampusertext_timestamp257const187372Using where; Using temporary; Using filesort
1SIMPLErecentchangesALLNULLNULLNULLNULL6666524
1SIMPLEpageeq_refPRIMARYPRIMARY4enwiki.revision.rev_page1

+----+-------------+---------------+--------+-------------------------------------------+--------------------+---------+--------------------------+---------+----------------------------------------------+
3 rows in set (0.01 sec)

Note join type ALL, key NULL. I only have views on the toolserver, not the actual tables, so I can't try forcing rc_patrolling as the index to use for recentchanges there (nor can I use the FORCE INDEX that was already in the query). When I run it on localhost, it chooses rc_patrolling for the recentchanges join.

(In reply to comment #4)

Note join type ALL, key NULL. I only have views on the toolserver, not the
actual tables, so I can't try forcing rc_patrolling as the index to use for
recentchanges there (nor can I use the FORCE INDEX that was already in the
query).

Why not?

When I run it on localhost, it chooses rc_patrolling for the
recentchanges join.

For me too, except that on one of my two computers, I don't *have* an rc_patrolling index, while on the other one I do. The index also isn't listed in tables.sql, and there's no updater patch that adds it. Where did my index go? :O :D

Back on topic: maybe adding FORCE INDEX(rc_patrolling) will fix this, provided that index is present on the WMF DB servers. You also seem to indicate that the effect of this FORCE INDEX can't be tested on the toolserver for some reason.

(In reply to comment #5)

The index also isn't listed
in tables.sql, and there's no updater patch that adds it. Where did my index
go? :O :D

Deeper research (thanks to TortoiseSVN's search facilities) turned up r25527 in which Domas removed the rc_patrolling index (added by Greg in r24699) for no apparent reason and without adding a remover in updaters.inc .

ayg wrote:

(In reply to comment #5)

Why not?

FORCE INDEX doesn't work on views, and non-root toolserver users only have access to views.

For me too, except that on one of my two computers, I don't *have* an
rc_patrolling index, while on the other one I do. The index also isn't listed
in tables.sql, and there's no updater patch that adds it. Where did my index
go? :O :D

Um, yeah, that would explain it. See r25527. The index doesn't exist. I've dropped it from localhost so I don't get confused in the future. I don't see any good way to get pages' patrolled status, as long as it's only in RC -- why isn't it in revision as well, again? Grr.

(simulpost)

(In reply to comment #6)

Deeper research (thanks to TortoiseSVN's search facilities) turned up r25527 in
which Domas removed the rc_patrolling index (added by Greg in r24699) for no
apparent reason and without adding a remover in updaters.inc .

The reason was almost certainly just because having unneeded indexes is slow. Is there any compelling justification for keeping this? What actually would need to use it other than some API queries?

(In reply to comment #7)

The reason was almost certainly just because having unneeded indexes is slow.
Is there any compelling justification for keeping this? What actually would
need to use it other than some API queries?

It makes it possible to find the recentchanges entries for a set of revisions/revids (either with a join on the revision table or with a WHERE clause), which is generic enough to warrant an index IMO, even if it's just an index on rc_this_oldid.

ayg wrote:

That's not an access pattern RC is supposed to support, though. It's really just supposed to summarize other tables. If you have the revision id's already, all relevant info should be in other tables like revision and page. The problem is, for some reason rc_patrolled is not.

(In reply to comment #9)

That's not an access pattern RC is supposed to support, though. It's really
just supposed to summarize other tables. If you have the revision id's
already, all relevant info should be in other tables like revision and page.
The problem is, for some reason rc_patrolled is not.

Exactly, same story with the bot flag. So we either need to move/copy the patrolled flag elsewhere, or add that index.

matthew.britton wrote:

(In reply to comment #10)

Exactly, same story with the bot flag. So we either need to move/copy the
patrolled flag elsewhere, or add that index.

In the meantime would it be wise to remove 'patrolled' as an option for ucprop and/or handle this error more gracefully, since it's unlikely to start working again by iteslf?

(In reply to comment #11)

In the meantime would it be wise to remove 'patrolled' as an option for ucprop
and/or handle this error more gracefully, since it's unlikely to start working
again by iteslf?

Temporarily disabled in r46795

Re-enabled in r46825 with a smart JOIN on the timestamp first (indexed) and the revid second (not indexed, but usually a no-op or a very small set since two revs with the same timestamp by the same user are pretty rare). Thanks to Tim for the tip.