Page MenuHomePhabricator

Add index to page.page_content_model
Closed, DeclinedPublic1 Estimated Story Points

Description

We want an index on page.page_content_model so we can easily figure out which pages have which content model.

Something like... CREATE INDEX /*i*/page_content_model ON /*_*/page (page_content_model); probably.

Event Timeline

Legoktm raised the priority of this task from to Needs Triage.
Legoktm updated the task description. (Show Details)
Legoktm added subscribers: Legoktm, Aklapper.
Catrope triaged this task as High priority.Jul 1 2015, 11:40 PM
Catrope set Security to None.
Catrope moved this task from Untriaged to Team discussion on the Collaboration-Team-Triage board.

Note T105652: RfC: Content model storage would replace the page.content_model column (a string such as "Flow-board") with page.content_model_id.

We want an index on page.page_content_model so we can easily figure out which pages have which content model.

Something like... CREATE INDEX /*i*/page_content_model ON /*_*/page (page_content_model); probably.

When you have agreed what change you want to do, and ready to deploy it, for now tell a DBA to proceed with the following data:

  • What change you want to do. A code diff is probably the most clear way.
  • Which wikis you want to apply it to. A pointer to a specific db list is preferred.
  • When to apply it (if it depends on a particular commit).
  • If the change is forward-compatible with the deployed codebase.

If there is any pending change on the same table, it is preferred to do both at the same time, as a schema change affects traffic quite significantly.

I do not monitor the schema-change tag, I do the Database and blocked-on-operations.

Having said that, an index on page_content_model may not be useful if search results account for more than 30% of the rows.

Having said that, an index on page_content_model may not be useful if search results account for more than 30% of the rows.

Could you clarify this?

Do you mean an index may not be sufficient if more than 30% of pages have content model X? A large amount (much more than 30%) have 'wikitext' content model.

The goal is to implement T63747: Flow should be able to enumerate all flow enabled pages on a wiki.

A large amount (much more than 30%) have 'wikitext' content model.

Then an index there will not be useful in most cases (it will not speed up the search)- the mysql optimizer will ignore the usage of such index and do a full scan instead.

A large amount (much more than 30%) have 'wikitext' content model.

Then an index there will not be useful in most cases (it will not speed up the search)- the mysql optimizer will ignore the usage of such index and do a full scan instead.

I think we'd generally be looking for one of the rarer values of page_content_model. For example, T63747: Flow should be able to enumerate all flow enabled pages on a wiki asks for a special page (or similar UI) that would list all Flow pages on the wiki, and page_content_model='flow-board' is much rarer than 'wikitext'. Presumably an index would still help there, even if the majority of pages are 'wikitext'?

Note T230607: stop using page_content_model - I would suggest declining this, since the field should be removed instead

Aklapper lowered the priority of this task from High to Low.Apr 11 2020, 2:30 PM

Lowering priority per last comment - could someone comment if this should be declined in favor of T230607: stop using page_content_model, please?

@ MediaWiki-ContentHandler folks: Should this task be declined in favor of T230607: stop using page_content_model or not?
Thanks in advance for a reply / decision.

Boldly declining in favor of T230607