Page MenuHomePhabricator

Add index to page.page_content_model
Open, HighPublic1 Story Points

Description

We want an index on page.page_content_model so we can easily figure out which pages have which content model.

Something like... CREATE INDEX /*i*/page_content_model ON /*_*/page (page_content_model); probably.

Event Timeline

Legoktm created this task.Jun 22 2015, 5:39 PM
Legoktm updated the task description. (Show Details)
Legoktm raised the priority of this task from to Needs Triage.
Legoktm added subscribers: Legoktm, Aklapper.
Catrope triaged this task as High priority.Jul 1 2015, 11:40 PM
Catrope set Security to None.
Catrope moved this task from Untriaged to Team discussion on the Collaboration-Team-Triage board.
DannyH edited a custom field.Jul 7 2015, 6:42 PM

Note T105652: RfC: Content model storage would replace the page.content_model column (a string such as "Flow-board") with page.content_model_id.

We want an index on page.page_content_model so we can easily figure out which pages have which content model.

Something like... CREATE INDEX /*i*/page_content_model ON /*_*/page (page_content_model); probably.

When you have agreed what change you want to do, and ready to deploy it, for now tell a DBA to proceed with the following data:

  • What change you want to do. A code diff is probably the most clear way.
  • Which wikis you want to apply it to. A pointer to a specific db list is preferred.
  • When to apply it (if it depends on a particular commit).
  • If the change is forward-compatible with the deployed codebase.

If there is any pending change on the same table, it is preferred to do both at the same time, as a schema change affects traffic quite significantly.

I do not monitor the schema-change tag, I do the Database and blocked-on-operations.

Having said that, an index on page_content_model may not be useful if search results account for more than 30% of the rows.

Having said that, an index on page_content_model may not be useful if search results account for more than 30% of the rows.

Could you clarify this?

Do you mean an index may not be sufficient if more than 30% of pages have content model X? A large amount (much more than 30%) have 'wikitext' content model.

The goal is to implement T63747: Flow should be able to enumerate all flow enabled pages on a wiki.

Danny_B moved this task from Unsorted to Add / Create on the Schema-change board.Apr 29 2016, 10:19 PM

A large amount (much more than 30%) have 'wikitext' content model.

Then an index there will not be useful in most cases (it will not speed up the search)- the mysql optimizer will ignore the usage of such index and do a full scan instead.

A large amount (much more than 30%) have 'wikitext' content model.

Then an index there will not be useful in most cases (it will not speed up the search)- the mysql optimizer will ignore the usage of such index and do a full scan instead.

I think we'd generally be looking for one of the rarer values of page_content_model. For example, T63747: Flow should be able to enumerate all flow enabled pages on a wiki asks for a special page (or similar UI) that would list all Flow pages on the wiki, and page_content_model='flow-board' is much rarer than 'wikitext'. Presumably an index would still help there, even if the majority of pages are 'wikitext'?