Introduce a setting for entity types supported by Special:EntitiesWithoutLabel/ Special:EntitiesWithoutDescription
Closed, ResolvedPublic

Description

This is needed if we want to switch the SqlEntitiesWithoutTermFinder implementation to not make use of the wb_entity_per_page table. Changing the implementation to join against the page table by using a str replace on the entity id is less efficient, so we need to be sure we can restrict that in production to properties (where we only have a few thousand rows).

I just discussed with Daniel if we actually need Special:EntitiesWithoutDescription and Special:EntitiesWithoutLabel They lost their usefulness mostly as Wikidata grew. If you want to find items without a label in a given language the resultset is in most cases too large to be useful. A tool like https://tools.wmflabs.org/wikidata-terminator/ is needed to help find the items where a label or description is actually important to have.
The one remaining usecase for the special pages is then doing this for properties. Here the number is manageable and useful. So we can optimize for this case and remove the item case.

The idea behind restricting to Properties is: Without wb_entity_per_page, joining against the wb_terms table is inefficient. It would still work for Properties though, because there aren't that many of them.

hoo created this task.Oct 7 2016, 1:00 PM
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptOct 7 2016, 1:00 PM

How big is the difference between the two SQL queries? Can't we optimize it more? Is the difference really that relevant for two special pages that nobody uses anyway?

hoo added a comment.Oct 7 2016, 1:29 PM

How big is the difference between the two SQL queries?

Significant. The old query used an indexed integer column for joining, while the new uses a REPLACE() on a column for joining.

Can't we optimize it more?

I can't see how we can optimize it more with the current wb_terms table structure.

Is the difference really that relevant for two special pages that nobody uses anyway?

Probably not, that's why we remove the support for Items. The new query might be so bad that it could be used in a harmful way if you have the possibility to trigger scanning multiple million rows.

thiemowmde closed this task as Resolved.Mar 25 2017, 2:43 PM

I believe this was resolved with https://gerrit.wikimedia.org/r/314555.