Page MenuHomePhabricator

CodeRevisionListView::getRevCount is creating slow queries on mediawiki.org
Open, LowPublic

Description

SELECT /* CodeRevisionListView::getRevCount <ip> */ COUNT( DISTINCT cr_id ) AS rev_count FROM `code_rev` LEFT JOIN `code_comment` ON ((cc_repo_id = cr_repo_id AND cc_rev_id = cr_id)) WHERE cr_repo_id = '1' LIMIT 1

Full list at P8182, creating 1052 20+ second queries on mediawikiwiki (s3).

The extension should be fixed to avoid creating long running queries or (it says here it should no longer be used on WMF: https://www.mediawiki.org/wiki/Extension:CodeReview ), fully deprecated from WMF infrastructure and undeployed.

Event Timeline

jcrespo created this task.Mar 12 2019, 9:35 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 12 2019, 9:35 AM
hashar added a subscriber: hashar.Mar 12 2019, 10:07 AM

That SQL query has been in the code and deployed since at least 2011. The reason is most definitely web crawler browsing through some url of https://www.mediawiki.org/wiki/Special:Code . I can not tell though which exact path is being hit :(

Going through EXPLAIN shows the query is rather straightforward using indices and going through 62k rows:

10.64.16.191(mediawikiwiki)> EXPLAIN SELECT /* CodeRevisionListView::getRevCount XXXX */ COUNT( DISTINCT cr_id ) AS rev_count FROM `code_rev` LEFT JOIN `code_comment` ON ((cc_repo_id = cr_repo_id AND cc_rev_id = cr_id)) WHERE cr_repo_id = '1' LIMIT 1 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: code_rev
         type: ref
possible_keys: PRIMARY,cr_repo_id,cr_repo_author,cr_repo_status_author
          key: cr_repo_id
      key_len: 4
          ref: const
         rows: 61976
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: code_comment
         type: ref
possible_keys: cc_repo_id,cc_repo_time,cc_author
          key: cc_repo_id
      key_len: 8
          ref: mediawikiwiki.code_rev.cr_repo_id,mediawikiwiki.code_rev.cr_id
         rows: 1
        Extra: Using index
2 rows in set (0.00 sec)

Running it in production:

)> SELECT /* CodeRevisionListView::getRevCount XXXX */ COUNT( DISTINCT cr_id ) AS rev_count FROM `code_rev` LEFT JOIN `code_comment` ON ((cc_repo_id = cr_repo_id AND cc_rev_id = cr_id)) WHERE cr_repo_id = '1' LIMIT 1 \G
*************************** 1. row ***************************
rev_count: 115792
1 row in set (0.29 sec)

Could it be that too many have/are being done at the same time and some kind of lock happens on the database that causes them to wait? Note the query is done on the replica (or should).

greg added a subscriber: greg.

Just removing our team project as we probably won't work on fixing this issue specifically. Rather we want to work on removing this extension from production.

Yes sorry, your team tag was added when I wasn't aware of T116948

Krinkle triaged this task as Low priority.Apr 3 2019, 2:13 AM
Jdforrester-WMF changed the status of subtask T116948: Undeploy CodeReview from Open to Stalled.Apr 8 2019, 4:30 PM
hashar removed a subscriber: hashar.Apr 9 2019, 9:54 AM