Page MenuHomePhabricator

create revision_commentindex
Open, Needs TriagePublic

Description

While trying to query revision table based on comments in toolforge environment, I noticed awful performance for simple query like:
select rev_id, comment_text from revision inner join comment on rev_comment_id=comment_id where comment_text like '%CSD#G11%' limit 10;

Following the good work in T166733 and related tasks, I think it may be cool to have revision_commentindex similar to revision_userindex in toolsforge for use by tools that query based on comments.

Event Timeline

eranroz created this task.Nov 4 2018, 8:52 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 4 2018, 8:52 PM
Krenair added a subscriber: Krenair.Nov 4 2018, 9:14 PM
JJMC89 edited projects, added Data-Services; removed Cloud-Services.Nov 4 2018, 9:50 PM
JJMC89 moved this task from Backlog to Wiki replicas on the Data-Services board.

I note there are likely three sources of slowness here:

  1. Scanning comment for comment_text like '%CSD#G11%' can't use any indexes.
  2. Accessing comment in general is slow thanks to having to do many subqueries per row to check visibility.
  3. The query can't make use of the revcomment_comment_id index, because the revision view has that field aliased.

This request would help with #3. #2 is being discussed as T215445: comment and actor view challenges for Cloud Services. I doubt there's anything we can do about #1, that part will always be slow.

Bstorm added a subscriber: Bstorm.Jun 4 2019, 8:50 PM
Krinkle added a subscriber: Krinkle.

Improving #2 and #3 would be a big win I think. On the bigger wikis, these queries currently time out for me after an hour or so.

Can we get anywhere by using the comment_revision view instead of comment? https://wikitech.wikimedia.org/wiki/News/Actor_storage_changes_on_the_Wiki_Replicas#The_actor_table_seems_really_slow--so_does_comment

Revision is a weird table (because of the temp table), so I'm curious if that will help @Anomie
That table only needs to check revision instead of 7 other subqueries.

Using comment_revision would help with #2 by reducing the number of subqueries needed.

I'd rather not make a slippery slope argument, but I must admit it's there in my mind. Revision is the table where I'd want this kind of thing vs. probably anywhere else.

We are proliferating views rather quickly lately, and the setup is becoming steadily more complex to maintain and use. The value of a commentindex view will drop off entirely once the temp table is no longer there, so we'd want to leave a comment to that effect, I think, so that when the aliasing is dropped, the commentindex view is deprecated (and dropped with warning) as well.