I'm running into an odd issue with performance of some queries on the replica database from the toolforge servers (known to happen both from the sge bastions and the grid itself). The following query:
SELECT bp.page_title, bp.page_id, pr_index.pr_count, ifnull(sum(pp_value=0 and pp_propname='proofread_page_quality_level'), 0) as q_0 FROM page AS sp INNER JOIN page AS bp ON(sp.page_title like concat(bp.page_title, '/%') and sp.page_namespace=104) INNER JOIN page_props ON(sp.page_id=pp_page) INNER JOIN pr_index ON(pr_page_id=bp.page_id) WHERE bp.page_id=?
Works correctly and swiftly (in miliseconds) for roughly 95% of pages in the database, but stalls for anywhere from 15 to 60 seconds for the remaining 5%. There seems to be no pattern to which 5%, but it is consistent (that is, the same page_id will either be always quick or always slow). For instance:
[...] WHERE bp.page_id=2894727;
returns in miliseconds, whereas
[...] WHERE bp.page_id=2894947;
always takes over 16s and sometimes as much as 40s.
The query always returns exactly one row, and the stalled queries do not appear to relate to the number of subpages that are involved. Stalled queries are shown in the processlist as being in state 'Sending data'.
@Bstorm found that while the issue can be replicated from the toolforge bastions, making the same queries directly on the DBs have all queries return nearly instantly.